2025-09-11 19:34:50,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc10-walker2d/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 19:34:50,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc10-walker2d/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 19:34:50,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14b7b7e4cdd0>}
2025-09-11 19:34:50,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1111 [DEBUG]: using device: cuda
2025-09-11 19:34:50,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1133 [INFO]: Creating new trainer
2025-09-11 19:34:50,878 baseline-mbpac-noiseperc10-walker2d:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-11 19:34:50,878 baseline-mbpac-noiseperc10-walker2d:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 19:34:50,885 baseline-mbpac-noiseperc10-walker2d:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-11 19:34:51,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1194 [DEBUG]: Starting training session...
2025-09-11 19:34:51,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 1/100
2025-09-11 19:45:00,472 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:45:00,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:45:58,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 188.75703 ± 116.380
2025-09-11 19:45:58,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [76.45168, 184.19052, 336.40027, 109.10312, 132.13855, 74.58069, 308.91028, 351.5408, 23.919827, 290.33456]
2025-09-11 19:45:58,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [188.0, 296.0, 225.0, 222.0, 242.0, 187.0, 194.0, 233.0, 131.0, 172.0]
2025-09-11 19:45:58,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (188.76) for latency ExtremeClogL1U23
2025-09-11 19:45:58,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 18 hours, 19 minutes, 30 seconds)
2025-09-11 19:57:44,494 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:57:44,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:58:18,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 123.81370 ± 121.168
2025-09-11 19:58:18,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [-33.398903, 195.42357, 215.49042, 277.05292, 77.06616, 14.948097, 344.26843, 17.33263, 110.73752, 19.216026]
2025-09-11 19:58:18,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [118.0, 143.0, 176.0, 235.0, 174.0, 27.0, 179.0, 55.0, 108.0, 32.0]
2025-09-11 19:58:18,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 19 hours, 9 minutes, 11 seconds)
2025-09-11 20:09:53,060 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:09:53,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:10:40,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 138.74075 ± 149.150
2025-09-11 20:10:40,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [29.632164, -5.603208, 2.12215, 266.81128, 258.35962, -1.0366913, 327.2405, -40.568798, 186.1672, 364.28336]
2025-09-11 20:10:40,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [226.0, 49.0, 191.0, 134.0, 123.0, 12.0, 246.0, 162.0, 358.0, 196.0]
2025-09-11 20:10:40,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 19 hours, 18 minutes, 1 second)
2025-09-11 20:22:23,204 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:22:23,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:23:23,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 165.43266 ± 133.450
2025-09-11 20:23:23,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [220.34404, 43.214497, 1.8800926, 55.53279, 240.52116, 306.6606, 48.651653, 53.673363, 402.33908, 281.50934]
2025-09-11 20:23:23,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [259.0, 220.0, 302.0, 81.0, 142.0, 212.0, 196.0, 81.0, 383.0, 257.0]
2025-09-11 20:23:23,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 19 hours, 24 minutes, 42 seconds)
2025-09-11 20:35:13,857 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:35:13,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:36:03,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 304.99347 ± 137.550
2025-09-11 20:36:03,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [297.8758, 347.55103, 357.19937, 203.38258, 227.4753, 257.9104, 399.1468, 367.87222, 19.108515, 572.4128]
2025-09-11 20:36:03,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [171.0, 148.0, 177.0, 131.0, 127.0, 155.0, 212.0, 193.0, 31.0, 415.0]
2025-09-11 20:36:03,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (304.99) for latency ExtremeClogL1U23
2025-09-11 20:36:03,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 19 hours, 22 minutes, 38 seconds)
2025-09-11 20:48:00,331 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:48:00,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:48:57,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 341.14410 ± 94.487
2025-09-11 20:48:57,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [373.46442, 366.0357, 194.09657, 303.97113, 256.33676, 391.38492, 297.54678, 467.3026, 250.35495, 510.94727]
2025-09-11 20:48:57,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [171.0, 184.0, 112.0, 138.0, 203.0, 207.0, 307.0, 236.0, 143.0, 318.0]
2025-09-11 20:48:57,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (341.14) for latency ExtremeClogL1U23
2025-09-11 20:48:57,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 19 hours, 44 minutes, 8 seconds)
2025-09-11 21:00:49,414 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:00:49,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:01:45,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 348.05029 ± 81.543
2025-09-11 21:01:45,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [366.0239, 210.2819, 267.4844, 299.2795, 298.20874, 339.61453, 394.84335, 410.2438, 376.47644, 518.04626]
2025-09-11 21:01:45,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [176.0, 137.0, 156.0, 186.0, 171.0, 186.0, 205.0, 220.0, 236.0, 313.0]
2025-09-11 21:01:45,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (348.05) for latency ExtremeClogL1U23
2025-09-11 21:01:45,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 19 hours, 39 minutes, 53 seconds)
2025-09-11 21:13:30,225 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:13:30,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:14:35,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 415.67090 ± 190.950
2025-09-11 21:14:35,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [365.98703, 334.2384, 244.67899, 604.2426, 407.16653, 319.0553, 419.4625, 566.7192, 805.66205, 89.49614]
2025-09-11 21:14:35,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [181.0, 183.0, 160.0, 259.0, 220.0, 197.0, 215.0, 317.0, 349.0, 242.0]
2025-09-11 21:14:35,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (415.67) for latency ExtremeClogL1U23
2025-09-11 21:14:35,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 19 hours, 36 minutes, 7 seconds)
2025-09-11 21:26:27,254 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:26:27,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:27:19,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 342.37018 ± 187.941
2025-09-11 21:27:19,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [325.02405, 462.23325, 465.7435, 374.89536, 319.67313, 298.46643, 672.5894, 450.4138, 44.800594, 9.862172]
2025-09-11 21:27:19,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [184.0, 234.0, 255.0, 216.0, 166.0, 152.0, 312.0, 276.0, 56.0, 21.0]
2025-09-11 21:27:19,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 19 hours, 23 minutes, 39 seconds)
2025-09-11 21:38:57,069 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:38:57,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:39:47,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 335.37958 ± 140.918
2025-09-11 21:39:47,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [6.5312357, 523.5055, 392.17447, 376.4658, 330.92123, 277.51425, 428.15942, 180.12335, 419.8169, 418.5835]
2025-09-11 21:39:47,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 277.0, 196.0, 188.0, 150.0, 132.0, 180.0, 183.0, 219.0, 243.0]
2025-09-11 21:39:47,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 19 hours, 7 minutes, 10 seconds)
2025-09-11 21:51:37,793 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:51:37,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:52:33,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 410.35889 ± 210.206
2025-09-11 21:52:33,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [428.21808, 470.18958, 231.46626, 484.261, 325.18274, 48.785397, 418.01404, 374.49655, 401.6425, 921.33246]
2025-09-11 21:52:33,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [236.0, 233.0, 131.0, 272.0, 141.0, 47.0, 187.0, 165.0, 192.0, 386.0]
2025-09-11 21:52:33,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 18 hours, 52 minutes, 6 seconds)
2025-09-11 22:04:14,316 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:04:14,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:05:09,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 429.36163 ± 210.278
2025-09-11 22:05:09,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [271.46933, 495.19174, 583.6607, 343.48572, 734.6591, 346.3243, 13.516333, 303.65002, 470.27618, 731.38275]
2025-09-11 22:05:09,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [131.0, 277.0, 264.0, 163.0, 356.0, 179.0, 25.0, 136.0, 188.0, 300.0]
2025-09-11 22:05:09,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (429.36) for latency ExtremeClogL1U23
2025-09-11 22:05:09,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 18 hours, 36 minutes, 7 seconds)
2025-09-11 22:16:46,213 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:16:46,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:17:39,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 381.13809 ± 136.964
2025-09-11 22:17:39,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [84.78828, 548.41046, 279.4644, 317.6836, 474.43393, 319.45044, 389.53128, 545.25397, 509.40372, 342.96066]
2025-09-11 22:17:39,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [77.0, 222.0, 179.0, 160.0, 274.0, 166.0, 191.0, 239.0, 268.0, 166.0]
2025-09-11 22:17:39,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 18 hours, 17 minutes, 12 seconds)
2025-09-11 22:29:10,563 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:29:10,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:30:32,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 526.40369 ± 366.829
2025-09-11 22:30:32,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [538.6901, 1367.83, 501.599, 641.313, 340.12894, 889.2556, 122.02183, 15.922414, 345.94614, 501.33032]
2025-09-11 22:30:32,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [268.0, 665.0, 378.0, 391.0, 167.0, 525.0, 104.0, 25.0, 180.0, 261.0]
2025-09-11 22:30:32,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (526.40) for latency ExtremeClogL1U23
2025-09-11 22:30:32,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 18 hours, 7 minutes, 18 seconds)
2025-09-11 22:42:14,882 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:42:14,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:43:20,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 542.33563 ± 166.661
2025-09-11 22:43:20,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [640.35284, 358.94318, 545.52875, 739.9523, 634.02814, 690.69275, 352.6139, 371.57483, 321.37146, 768.2982]
2025-09-11 22:43:20,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [235.0, 188.0, 261.0, 308.0, 220.0, 341.0, 171.0, 141.0, 156.0, 329.0]
2025-09-11 22:43:20,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (542.34) for latency ExtremeClogL1U23
2025-09-11 22:43:20,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 18 hours, 34 seconds)
2025-09-11 22:54:53,216 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:54:53,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:56:10,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 676.77148 ± 233.180
2025-09-11 22:56:10,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [461.87628, 637.30835, 468.8647, 979.1785, 392.37363, 594.9699, 681.7826, 645.1202, 1198.0494, 708.19147]
2025-09-11 22:56:10,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [208.0, 307.0, 172.0, 432.0, 202.0, 215.0, 284.0, 303.0, 390.0, 322.0]
2025-09-11 22:56:10,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (676.77) for latency ExtremeClogL1U23
2025-09-11 22:56:10,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 17 hours, 48 minutes, 49 seconds)
2025-09-11 23:07:44,383 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:07:44,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:09:03,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 670.41663 ± 445.183
2025-09-11 23:09:03,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1907.7231, 390.67444, 616.7183, 908.9357, 620.8673, 412.77643, 345.5947, 601.0303, 585.0611, 314.78485]
2025-09-11 23:09:03,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [759.0, 181.0, 293.0, 343.0, 308.0, 188.0, 171.0, 267.0, 247.0, 146.0]
2025-09-11 23:09:03,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 17 hours, 40 minutes, 44 seconds)
2025-09-11 23:20:43,539 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:20:43,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:22:02,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 636.18555 ± 296.472
2025-09-11 23:22:02,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [481.70956, 508.7762, 1036.4335, 1343.6691, 437.29523, 574.6689, 618.02673, 331.51074, 436.78198, 592.98334]
2025-09-11 23:22:02,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [268.0, 232.0, 455.0, 493.0, 201.0, 229.0, 292.0, 188.0, 232.0, 264.0]
2025-09-11 23:22:02,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 17 hours, 35 minutes, 50 seconds)
2025-09-11 23:33:29,706 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:33:29,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:34:48,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 741.74652 ± 272.511
2025-09-11 23:34:48,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [498.76825, 788.5185, 518.6151, 431.31018, 1316.1646, 651.92224, 681.2366, 999.29333, 514.7573, 1016.8795]
2025-09-11 23:34:48,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [204.0, 295.0, 213.0, 210.0, 428.0, 233.0, 331.0, 319.0, 244.0, 391.0]
2025-09-11 23:34:48,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (741.75) for latency ExtremeClogL1U23
2025-09-11 23:34:48,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 17 hours, 20 minutes, 59 seconds)
2025-09-11 23:46:20,130 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:46:20,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:47:22,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 545.08441 ± 165.177
2025-09-11 23:47:22,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [553.2415, 360.14328, 529.3682, 443.48782, 492.5814, 583.7051, 787.02234, 372.05295, 895.59393, 433.64798]
2025-09-11 23:47:22,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [243.0, 159.0, 207.0, 175.0, 208.0, 262.0, 253.0, 180.0, 371.0, 215.0]
2025-09-11 23:47:22,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 17 hours, 4 minutes, 25 seconds)
2025-09-11 23:58:54,840 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:58:54,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:00:15,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 723.89844 ± 194.695
2025-09-12 00:00:15,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [975.229, 795.7056, 582.80707, 658.0489, 711.02655, 405.0765, 1010.45605, 940.09937, 494.93503, 665.60034]
2025-09-12 00:00:15,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [368.0, 307.0, 241.0, 259.0, 289.0, 172.0, 415.0, 339.0, 257.0, 268.0]
2025-09-12 00:00:15,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 16 hours, 52 minutes, 23 seconds)
2025-09-12 00:12:44,077 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:12:44,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:14:41,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1082.38318 ± 399.344
2025-09-12 00:14:41,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1285.3945, 1116.5574, 937.8801, 935.5779, 1207.7693, 517.511, 1411.6105, 1918.1124, 479.36264, 1014.05707]
2025-09-12 00:14:41,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [483.0, 423.0, 391.0, 342.0, 413.0, 207.0, 597.0, 786.0, 253.0, 352.0]
2025-09-12 00:14:41,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (1082.38) for latency ExtremeClogL1U23
2025-09-12 00:14:41,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 17 hours, 3 minutes, 48 seconds)
2025-09-12 00:25:49,133 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:25:49,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:28:04,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1288.90955 ± 797.912
2025-09-12 00:28:04,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [711.3631, 2802.2668, 532.19434, 1005.7145, 1286.6499, 975.3314, 513.3732, 564.0086, 2147.0808, 2351.1123]
2025-09-12 00:28:04,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [260.0, 1000.0, 235.0, 438.0, 461.0, 365.0, 265.0, 282.0, 741.0, 853.0]
2025-09-12 00:28:04,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (1288.91) for latency ExtremeClogL1U23
2025-09-12 00:28:04,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 16 hours, 57 minutes, 2 seconds)
2025-09-12 00:38:59,401 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:38:59,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:40:29,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 864.04327 ± 807.179
2025-09-12 00:40:29,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1346.9719, 602.82086, 1808.0743, 973.93616, 200.5348, 10.820799, 49.047066, 2514.779, 29.84824, 1103.599]
2025-09-12 00:40:29,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [485.0, 248.0, 590.0, 391.0, 119.0, 21.0, 47.0, 1000.0, 37.0, 365.0]
2025-09-12 00:40:29,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 16 hours, 38 minutes, 25 seconds)
2025-09-12 00:51:57,437 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:51:57,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:54:34,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1509.47107 ± 1022.038
2025-09-12 00:54:34,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [2453.4392, 1079.676, 63.823246, 1163.6791, 2802.0054, 1528.9474, 722.9677, 11.114972, 2417.5674, 2851.4912]
2025-09-12 00:54:34,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 456.0, 67.0, 464.0, 1000.0, 532.0, 298.0, 21.0, 856.0, 1000.0]
2025-09-12 00:54:34,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (1509.47) for latency ExtremeClogL1U23
2025-09-12 00:54:34,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 16 hours, 47 minutes, 55 seconds)
2025-09-12 01:06:53,153 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:06:53,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:09:58,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1949.97888 ± 852.696
2025-09-12 01:09:58,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [2348.7383, 438.55618, 3083.3945, 2788.3047, 2012.9696, 1598.8536, 851.2663, 3043.1538, 1392.711, 1941.8402]
2025-09-12 01:09:58,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [873.0, 192.0, 1000.0, 924.0, 723.0, 586.0, 305.0, 1000.0, 487.0, 689.0]
2025-09-12 01:09:58,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (1949.98) for latency ExtremeClogL1U23
2025-09-12 01:09:58,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 17 hours, 11 minutes, 47 seconds)
2025-09-12 01:21:07,689 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:21:07,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:23:33,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1641.71814 ± 995.130
2025-09-12 01:23:33,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1046.712, 267.63577, 1881.4576, 3322.0042, 427.84583, 2470.4922, 3003.6755, 1895.2869, 1086.187, 1015.8849]
2025-09-12 01:23:33,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [398.0, 144.0, 553.0, 1000.0, 219.0, 766.0, 938.0, 580.0, 379.0, 333.0]
2025-09-12 01:23:33,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 16 hours, 45 minutes, 29 seconds)
2025-09-12 01:35:02,633 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:35:02,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:38:11,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2052.10913 ± 857.018
2025-09-12 01:38:11,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [2832.1208, 1772.6013, 1530.8745, 2124.4558, 3164.9817, 3320.214, 2612.8252, 1089.8998, 696.097, 1377.0206]
2025-09-12 01:38:11,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [899.0, 636.0, 537.0, 726.0, 1000.0, 1000.0, 958.0, 414.0, 266.0, 423.0]
2025-09-12 01:38:11,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (2052.11) for latency ExtremeClogL1U23
2025-09-12 01:38:11,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 16 hours, 49 minutes, 31 seconds)
2025-09-12 01:49:31,172 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:49:31,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:52:57,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2207.66797 ± 1052.365
2025-09-12 01:52:57,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [592.93726, 1011.93964, 2092.1409, 2877.0403, 2888.5771, 2843.894, 2949.3962, 426.67175, 3315.265, 3078.817]
2025-09-12 01:52:57,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [257.0, 367.0, 750.0, 1000.0, 1000.0, 1000.0, 1000.0, 208.0, 1000.0, 1000.0]
2025-09-12 01:52:57,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (2207.67) for latency ExtremeClogL1U23
2025-09-12 01:52:57,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 17 hours, 9 minutes, 7 seconds)
2025-09-12 02:05:15,429 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:05:15,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:08:19,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2112.54736 ± 639.105
2025-09-12 02:08:19,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1812.1302, 2818.6543, 2859.5925, 2301.77, 1650.5469, 1873.379, 2730.8433, 1106.8273, 2762.1782, 1209.5526]
2025-09-12 02:08:19,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [573.0, 868.0, 884.0, 744.0, 536.0, 613.0, 923.0, 363.0, 850.0, 392.0]
2025-09-12 02:08:19,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 17 hours, 12 minutes, 35 seconds)
2025-09-12 02:19:21,103 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:19:21,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:22:45,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2377.34155 ± 661.512
2025-09-12 02:22:45,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [2099.5583, 2215.6895, 3240.6355, 3220.7815, 2148.3652, 1499.4385, 1239.2529, 3164.6584, 2645.2053, 2299.8315]
2025-09-12 02:22:45,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [646.0, 624.0, 1000.0, 893.0, 741.0, 549.0, 459.0, 1000.0, 867.0, 704.0]
2025-09-12 02:22:45,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (2377.34) for latency ExtremeClogL1U23
2025-09-12 02:22:45,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 16 hours, 44 minutes, 24 seconds)
2025-09-12 02:34:10,854 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:34:10,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:36:28,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1617.77600 ± 1268.251
2025-09-12 02:36:28,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [321.83, 10.702255, 2569.4688, 1486.8167, 1307.648, 603.49255, 3467.9456, 642.66925, 3931.4675, 1835.7202]
2025-09-12 02:36:28,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [167.0, 21.0, 828.0, 456.0, 485.0, 220.0, 1000.0, 229.0, 1000.0, 554.0]
2025-09-12 02:36:28,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 16 hours, 31 minutes, 41 seconds)
2025-09-12 02:48:47,578 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:48:47,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:51:29,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2060.59668 ± 1466.754
2025-09-12 02:51:29,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [537.72015, 3580.01, 3505.321, 1183.1472, 74.13524, 8.714856, 1934.3063, 2268.6892, 3586.7527, 3927.1692]
2025-09-12 02:51:29,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [240.0, 1000.0, 1000.0, 364.0, 70.0, 21.0, 597.0, 632.0, 1000.0, 1000.0]
2025-09-12 02:51:29,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 16 hours, 22 minutes, 15 seconds)
2025-09-12 03:02:55,383 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:02:55,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:05:03,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1516.04272 ± 1175.760
2025-09-12 03:05:03,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3678.0334, 2033.837, 1606.9386, 1223.6862, 31.608389, 11.505767, 812.26154, 1391.9585, 3392.4385, 978.1593]
2025-09-12 03:05:03,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 593.0, 552.0, 362.0, 41.0, 22.0, 307.0, 459.0, 1000.0, 351.0]
2025-09-12 03:05:03,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 15 hours, 51 minutes, 36 seconds)
2025-09-12 03:16:06,751 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:16:06,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:19:26,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2510.17432 ± 1164.376
2025-09-12 03:19:26,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [301.5109, 1782.5876, 3328.455, 3576.6401, 3626.4448, 3217.4585, 1214.994, 3439.4392, 1313.7819, 3300.431]
2025-09-12 03:19:26,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [134.0, 478.0, 1000.0, 1000.0, 1000.0, 1000.0, 350.0, 929.0, 411.0, 1000.0]
2025-09-12 03:19:26,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (2510.17) for latency ExtremeClogL1U23
2025-09-12 03:19:26,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 15 hours, 24 minutes, 35 seconds)
2025-09-12 03:31:18,192 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:31:18,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:34:37,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2524.77466 ± 1199.916
2025-09-12 03:34:37,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [203.83414, 3555.4065, 1975.2609, 3016.6797, 3393.157, 3060.5178, 3722.182, 2429.2952, 3403.5173, 487.89417]
2025-09-12 03:34:37,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [103.0, 1000.0, 621.0, 847.0, 1000.0, 787.0, 1000.0, 728.0, 1000.0, 197.0]
2025-09-12 03:34:37,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (2524.77) for latency ExtremeClogL1U23
2025-09-12 03:34:37,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 15 hours, 20 minutes, 4 seconds)
2025-09-12 03:46:30,513 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:46:30,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:49:29,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2343.62866 ± 1117.429
2025-09-12 03:49:29,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3508.535, 1225.1055, 1126.3478, 3650.9263, 3328.2688, 410.32938, 3488.3652, 2539.3284, 2575.632, 1583.4501]
2025-09-12 03:49:29,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 354.0, 379.0, 1000.0, 864.0, 164.0, 1000.0, 679.0, 680.0, 414.0]
2025-09-12 03:49:29,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 15 hours, 20 minutes, 1 second)
2025-09-12 04:00:24,701 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:00:24,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:03:48,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2404.55615 ± 1274.075
2025-09-12 04:03:48,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3146.9482, 458.01822, 2075.9475, 3505.6218, 235.93173, 1060.9879, 3071.736, 3645.1406, 3363.3755, 3481.854]
2025-09-12 04:03:48,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 184.0, 655.0, 1000.0, 126.0, 330.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:03:48,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 14 hours, 56 minutes, 41 seconds)
2025-09-12 04:15:26,550 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:15:26,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:19:22,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2939.61353 ± 952.410
2025-09-12 04:19:22,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [2999.435, 1627.4248, 3083.2925, 3458.5837, 3650.3445, 3349.6587, 3342.58, 640.2148, 3637.1736, 3607.4253]
2025-09-12 04:19:22,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [863.0, 512.0, 1000.0, 1000.0, 1000.0, 1000.0, 952.0, 240.0, 1000.0, 1000.0]
2025-09-12 04:19:22,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (2939.61) for latency ExtremeClogL1U23
2025-09-12 04:19:22,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 15 hours, 6 minutes, 44 seconds)
2025-09-12 04:31:37,914 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:31:37,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:35:10,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2646.63354 ± 1303.352
2025-09-12 04:35:10,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3632.2869, 2759.504, 3655.8562, 217.24509, 3445.5383, 1.2653126, 3210.712, 3447.269, 2761.6177, 3335.0413]
2025-09-12 04:35:10,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 835.0, 1000.0, 151.0, 977.0, 13.0, 933.0, 1000.0, 761.0, 1000.0]
2025-09-12 04:35:10,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 15 hours, 8 minutes, 40 seconds)
2025-09-12 04:46:42,787 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:46:42,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:50:02,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2455.93408 ± 977.084
2025-09-12 04:50:02,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3433.414, 3003.9348, 3467.8025, 853.2176, 2238.3906, 1717.7651, 1767.1267, 3442.4578, 1172.9724, 3462.26]
2025-09-12 04:50:02,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 872.0, 1000.0, 292.0, 644.0, 528.0, 531.0, 1000.0, 371.0, 1000.0]
2025-09-12 04:50:02,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 14 hours, 49 minutes, 54 seconds)
2025-09-12 05:01:57,048 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:01:57,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:04:34,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1849.73206 ± 1020.622
2025-09-12 05:04:34,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1632.7639, 68.38121, 1412.4628, 1456.3442, 3614.8542, 2672.5654, 2096.104, 671.2834, 1757.9641, 3114.597]
2025-09-12 05:04:34,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [517.0, 86.0, 425.0, 458.0, 1000.0, 804.0, 656.0, 269.0, 511.0, 1000.0]
2025-09-12 05:04:34,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 14 hours, 30 minutes, 53 seconds)
2025-09-12 05:15:44,721 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:15:44,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:19:09,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2623.64038 ± 1309.510
2025-09-12 05:19:09,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [127.332245, 2155.0999, 3626.5452, 3056.169, 3850.3367, 3880.273, 1702.9335, 3453.8254, 3693.7153, 690.17426]
2025-09-12 05:19:09,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [88.0, 668.0, 1000.0, 909.0, 1000.0, 1000.0, 500.0, 1000.0, 1000.0, 271.0]
2025-09-12 05:19:09,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 14 hours, 19 minutes, 3 seconds)
2025-09-12 05:30:28,069 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:30:28,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:33:33,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2210.64697 ± 1127.882
2025-09-12 05:33:33,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3440.448, 2101.311, 1161.5426, 3437.6475, 2216.2593, 386.32278, 3529.2866, 705.43585, 1854.1321, 3274.084]
2025-09-12 05:33:33,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 643.0, 388.0, 1000.0, 713.0, 174.0, 1000.0, 230.0, 562.0, 1000.0]
2025-09-12 05:33:33,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 13 hours, 50 minutes, 49 seconds)
2025-09-12 05:44:52,917 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:44:52,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:49:09,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3377.98389 ± 499.342
2025-09-12 05:49:09,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3724.021, 2114.2268, 3663.5989, 2739.788, 3497.8394, 3614.7083, 3684.3308, 3597.73, 3577.5344, 3566.0608]
2025-09-12 05:49:09,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 601.0, 1000.0, 754.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:49:09,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (3377.98) for latency ExtremeClogL1U23
2025-09-12 05:49:09,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 13 hours, 33 minutes, 47 seconds)
2025-09-12 06:01:00,759 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:01:00,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:03:12,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1641.59241 ± 1258.599
2025-09-12 06:03:12,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [317.8978, 1889.5469, 3809.1914, 2230.894, 1369.9023, 484.46362, 207.58778, 2294.556, 333.3588, 3478.5251]
2025-09-12 06:03:12,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [143.0, 546.0, 1000.0, 671.0, 446.0, 200.0, 102.0, 628.0, 153.0, 959.0]
2025-09-12 06:03:12,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 13 hours, 10 minutes, 2 seconds)
2025-09-12 06:14:44,536 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:14:44,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:18:01,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2496.31494 ± 1346.423
2025-09-12 06:18:01,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3465.5652, 2637.1643, 10.702103, 3596.2754, 1722.3512, 46.458237, 3366.407, 3398.8186, 3602.41, 3116.9985]
2025-09-12 06:18:01,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 707.0, 21.0, 1000.0, 481.0, 50.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:18:01,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 12 hours, 58 minutes, 30 seconds)
2025-09-12 06:29:09,317 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:29:09,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:32:47,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2892.71802 ± 1056.925
2025-09-12 06:32:47,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3629.398, 3847.9019, 1966.1532, 1073.9644, 3757.6204, 3570.5725, 3770.7756, 3404.1255, 2832.1077, 1074.5598]
2025-09-12 06:32:47,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 587.0, 376.0, 1000.0, 1000.0, 1000.0, 897.0, 777.0, 342.0]
2025-09-12 06:32:47,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 12 hours, 45 minutes, 48 seconds)
2025-09-12 06:44:55,607 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:44:55,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:48:13,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2778.28076 ± 1508.571
2025-09-12 06:48:13,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [205.47389, 3823.101, 1547.6897, 3919.6917, 204.43391, 4078.0156, 3812.4436, 3888.4905, 2334.8188, 3968.6504]
2025-09-12 06:48:13,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [119.0, 1000.0, 430.0, 1000.0, 101.0, 1000.0, 1000.0, 1000.0, 633.0, 1000.0]
2025-09-12 06:48:13,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 12 hours, 41 minutes, 37 seconds)
2025-09-12 06:59:17,347 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:59:17,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:03:21,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3258.08667 ± 847.806
2025-09-12 07:03:21,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1700.6968, 1485.554, 3479.3345, 3537.182, 3779.457, 3388.113, 3726.997, 3902.999, 3754.1592, 3826.3757]
2025-09-12 07:03:21,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [512.0, 438.0, 1000.0, 1000.0, 1000.0, 946.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:03:21,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 12 hours, 22 minutes, 1 second)
2025-09-12 07:14:49,895 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:14:49,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:18:32,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3003.75928 ± 1358.828
2025-09-12 07:18:32,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3673.2515, 647.1428, 3860.11, 3146.7637, 3721.4204, 3717.3726, 7.0994015, 3712.5374, 3707.688, 3844.2068]
2025-09-12 07:18:32,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 239.0, 1000.0, 830.0, 1000.0, 1000.0, 21.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:18:32,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 12 hours, 18 minutes, 22 seconds)
2025-09-12 07:30:07,100 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:30:07,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:34:04,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3226.06934 ± 1102.239
2025-09-12 07:34:04,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3780.1536, 3634.0115, 1110.9471, 951.78204, 3591.5073, 3778.1975, 3755.4353, 3916.5784, 3905.4106, 3836.67]
2025-09-12 07:34:04,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 354.0, 329.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:34:04,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 12 hours, 10 minutes, 9 seconds)
2025-09-12 07:45:25,315 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:45:25,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:48:42,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2578.98755 ± 1474.123
2025-09-12 07:48:42,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1047.9402, 2832.3127, 3218.537, 3585.5374, 148.97362, 3649.9807, 3814.1057, 3735.9175, 7.194599, 3749.376]
2025-09-12 07:48:42,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [333.0, 809.0, 891.0, 1000.0, 101.0, 1000.0, 1000.0, 1000.0, 18.0, 1000.0]
2025-09-12 07:48:42,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 11 hours, 53 minutes, 38 seconds)
2025-09-12 08:00:56,782 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:00:56,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:03:10,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1634.33582 ± 1459.725
2025-09-12 08:03:10,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1001.05566, 3714.4795, 3686.2998, 2213.1177, 1377.5851, 3552.667, 316.56885, 216.34665, 8.193572, 257.0442]
2025-09-12 08:03:10,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [358.0, 1000.0, 1000.0, 622.0, 425.0, 1000.0, 162.0, 119.0, 18.0, 124.0]
2025-09-12 08:03:10,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 11 hours, 29 minutes, 26 seconds)
2025-09-12 08:15:01,340 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:15:01,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:17:39,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2014.05371 ± 1409.073
2025-09-12 08:17:39,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1839.1796, 3740.5393, 2386.9094, 3601.5537, 3692.6685, 1190.0984, 66.93951, 3007.164, 98.96424, 516.52124]
2025-09-12 08:17:39,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [542.0, 1000.0, 666.0, 1000.0, 1000.0, 382.0, 66.0, 827.0, 91.0, 216.0]
2025-09-12 08:17:39,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 11 hours, 8 minutes, 39 seconds)
2025-09-12 08:28:12,700 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:28:12,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:31:37,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2816.95166 ± 1350.446
2025-09-12 08:31:37,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3549.7178, 1637.9062, 3878.4758, 4041.673, 3876.4329, 2021.5753, 494.13406, 3776.1565, 812.96027, 4080.4849]
2025-09-12 08:31:37,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [938.0, 468.0, 1000.0, 1000.0, 1000.0, 539.0, 199.0, 1000.0, 255.0, 1000.0]
2025-09-12 08:31:37,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 10 hours, 43 minutes, 3 seconds)
2025-09-12 08:43:50,503 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:43:50,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:47:38,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3177.47412 ± 1448.906
2025-09-12 08:47:38,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [268.72775, 295.45367, 3859.1326, 3909.6145, 3842.449, 3987.1921, 3960.4182, 3990.454, 3811.8699, 3849.431]
2025-09-12 08:47:38,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [126.0, 157.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 992.0, 1000.0]
2025-09-12 08:47:38,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 10 hours, 32 minutes, 42 seconds)
2025-09-12 08:59:09,054 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:59:09,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:02:34,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2818.23853 ± 1393.671
2025-09-12 09:02:34,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [2271.3113, 3948.641, 3967.0737, 3619.645, 3770.9941, 9.385679, 1971.3448, 818.6143, 3805.2559, 4000.1194]
2025-09-12 09:02:34,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [589.0, 1000.0, 1000.0, 1000.0, 1000.0, 23.0, 559.0, 280.0, 1000.0, 1000.0]
2025-09-12 09:02:34,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 10 hours, 20 minutes, 30 seconds)
2025-09-12 09:13:40,896 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:13:40,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:17:38,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3334.26807 ± 1145.707
2025-09-12 09:17:38,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3836.766, 4034.9602, 1867.0278, 3737.3794, 3823.6067, 3847.4897, 3772.6646, 3876.754, 425.60825, 4120.422]
2025-09-12 09:17:38,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 525.0, 1000.0, 944.0, 1000.0, 1000.0, 1000.0, 167.0, 1000.0]
2025-09-12 09:17:38,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 10 hours, 10 minutes, 39 seconds)
2025-09-12 09:29:40,439 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:29:40,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:33:03,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2817.03003 ± 1570.586
2025-09-12 09:33:03,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3645.6682, 1136.58, 3832.9836, 234.07089, 3840.6006, 3869.2178, 3826.411, 3980.3315, 3806.467, -2.0293078]
2025-09-12 09:33:03,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 333.0, 1000.0, 99.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 16.0]
2025-09-12 09:33:03,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 10 hours, 3 minutes, 15 seconds)
2025-09-12 09:44:14,089 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:44:14,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:48:06,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3067.41284 ± 991.920
2025-09-12 09:48:06,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3054.7812, 3625.6748, 898.5063, 3464.4385, 1361.1676, 3492.6953, 3707.422, 3685.4631, 3664.368, 3719.6128]
2025-09-12 09:48:06,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [880.0, 1000.0, 282.0, 956.0, 442.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:48:06,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 9 hours, 56 minutes, 35 seconds)
2025-09-12 10:00:04,007 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:00:04,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:04:03,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3422.95557 ± 1168.558
2025-09-12 10:04:03,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3087.5796, 3974.1858, 3754.6628, 3906.6382, 4065.4636, 3838.2407, 8.41244, 3962.0073, 3656.865, 3975.498]
2025-09-12 10:04:03,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [765.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 21.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:04:03,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (3422.96) for latency ExtremeClogL1U23
2025-09-12 10:04:03,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 9 hours, 40 minutes, 45 seconds)
2025-09-12 10:14:49,540 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:14:49,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:18:17,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2955.95752 ± 1553.439
2025-09-12 10:18:17,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [23.040682, 3780.2275, -2.21232, 4096.5312, 4146.938, 3985.1394, 3929.5728, 2916.402, 2609.3477, 4074.588]
2025-09-12 10:18:17,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [28.0, 1000.0, 14.0, 1000.0, 1000.0, 1000.0, 1000.0, 768.0, 677.0, 1000.0]
2025-09-12 10:18:17,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 9 hours, 20 minutes, 16 seconds)
2025-09-12 10:29:57,352 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:29:57,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:33:21,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2847.25342 ± 1356.311
2025-09-12 10:33:21,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3617.4321, 3880.802, 4026.5776, 4031.6511, 1326.1288, 756.3855, 2026.8351, 4017.0217, 844.9352, 3944.766]
2025-09-12 10:33:21,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 968.0, 1000.0, 399.0, 244.0, 581.0, 1000.0, 277.0, 1000.0]
2025-09-12 10:33:21,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 9 hours, 5 minutes, 11 seconds)
2025-09-12 10:45:23,596 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:45:23,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:48:36,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2730.78931 ± 1764.079
2025-09-12 10:48:36,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [0.6525168, 3817.8162, 4001.361, 3975.9211, 3715.1006, 6.462987, 3637.609, 4131.874, 3895.5024, 125.59377]
2025-09-12 10:48:36,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 1000.0, 1000.0, 1000.0, 1000.0, 19.0, 1000.0, 1000.0, 1000.0, 83.0]
2025-09-12 10:48:36,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 8 hours, 48 minutes, 53 seconds)
2025-09-12 10:59:37,001 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:59:37,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:03:30,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3448.50342 ± 1023.770
2025-09-12 11:03:30,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [2516.0374, 3969.9863, 4092.2258, 4157.151, 4249.6724, 4068.4617, 3973.454, 4154.5723, 1939.8279, 1363.6481]
2025-09-12 11:03:30,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [672.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 545.0, 370.0]
2025-09-12 11:03:30,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (3448.50) for latency ExtremeClogL1U23
2025-09-12 11:03:30,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 8 hours, 32 minutes, 46 seconds)
2025-09-12 11:15:27,990 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:15:27,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:18:46,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2877.93896 ± 1535.859
2025-09-12 11:18:46,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [14.065705, 3732.5608, 3620.6765, 4043.2341, 273.48578, 4156.124, 3587.433, 1573.9238, 3686.6184, 4091.2659]
2025-09-12 11:18:46,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 925.0, 888.0, 1000.0, 121.0, 1000.0, 893.0, 432.0, 1000.0, 1000.0]
2025-09-12 11:18:46,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 8 hours, 13 minutes, 8 seconds)
2025-09-12 11:29:57,735 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:29:57,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:32:50,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2343.03613 ± 1743.562
2025-09-12 11:32:50,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3704.9714, 2946.1433, 3991.6147, 844.68243, 51.204727, 110.95892, 8.466784, 3813.692, 3889.582, 4069.0454]
2025-09-12 11:32:50,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 817.0, 1000.0, 264.0, 68.0, 99.0, 21.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:32:50,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 7 hours, 57 minutes, 2 seconds)
2025-09-12 11:43:27,400 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:43:27,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:46:39,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2732.71826 ± 1378.733
2025-09-12 11:46:39,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [75.817505, 2509.5496, 4003.3225, 3931.954, 260.98856, 2775.6438, 3128.6675, 3925.7747, 2924.8413, 3790.622]
2025-09-12 11:46:39,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [82.0, 660.0, 1000.0, 1000.0, 99.0, 702.0, 790.0, 1000.0, 724.0, 1000.0]
2025-09-12 11:46:39,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 7 hours, 34 minutes, 24 seconds)
2025-09-12 11:58:28,274 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:58:28,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:01:40,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2835.83203 ± 1469.831
2025-09-12 12:01:40,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [2788.1353, 2674.7078, 4413.8706, 3731.6606, 387.10983, 4275.896, 4064.36, 688.9939, 1245.9253, 4087.6594]
2025-09-12 12:01:40,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [754.0, 674.0, 1000.0, 875.0, 142.0, 1000.0, 959.0, 296.0, 341.0, 1000.0]
2025-09-12 12:01:40,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 18 minutes, 19 seconds)
2025-09-12 12:13:20,483 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:13:20,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:15:38,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1918.68457 ± 1562.132
2025-09-12 12:15:38,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3908.621, 1952.202, 4.573105, 3887.2424, 4040.225, 1061.3151, 132.31976, 2826.917, 132.16028, 1241.2704]
2025-09-12 12:15:38,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 534.0, 20.0, 1000.0, 1000.0, 307.0, 86.0, 704.0, 82.0, 373.0]
2025-09-12 12:15:38,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 58 minutes, 17 seconds)
2025-09-12 12:26:51,016 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:26:51,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:30:29,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3218.34058 ± 984.694
2025-09-12 12:30:29,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [4085.75, 3961.6282, 1815.5024, 3605.0598, 1604.8206, 3921.6467, 4182.969, 2674.6426, 4151.6187, 2179.768]
2025-09-12 12:30:29,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 467.0, 880.0, 452.0, 1000.0, 1000.0, 687.0, 1000.0, 565.0]
2025-09-12 12:30:29,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 41 minutes, 34 seconds)
2025-09-12 12:42:01,195 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:42:01,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:45:19,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2858.50342 ± 1502.611
2025-09-12 12:45:19,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3944.164, 1935.8268, 3935.7742, 4132.039, 4043.5054, 35.37862, 1151.8931, 3775.1006, 1293.2773, 4338.0737]
2025-09-12 12:45:19,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 492.0, 1000.0, 1000.0, 1000.0, 42.0, 414.0, 1000.0, 354.0, 1000.0]
2025-09-12 12:45:19,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 31 minutes, 29 seconds)
2025-09-12 12:56:28,006 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:56:28,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:00:50,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3900.60620 ± 483.764
2025-09-12 13:00:50,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [4030.4053, 4076.6494, 4160.1846, 3969.9165, 3952.532, 2464.7773, 4186.966, 4013.9832, 4070.61, 4080.0408]
2025-09-12 13:00:50,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 627.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:00:50,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (3900.61) for latency ExtremeClogL1U23
2025-09-12 13:00:50,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 25 minutes, 44 seconds)
2025-09-12 13:11:55,964 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:11:55,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:15:18,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2925.49658 ± 1556.642
2025-09-12 13:15:18,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3897.788, 180.71237, 3065.7659, 226.98886, 4078.9094, 1543.6765, 3974.341, 4122.761, 4114.94, 4049.0847]
2025-09-12 13:15:18,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 92.0, 844.0, 121.0, 1000.0, 428.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:15:18,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 8 minutes, 10 seconds)
2025-09-12 13:26:43,879 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:26:43,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:29:50,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2734.19067 ± 1429.462
2025-09-12 13:29:50,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [2167.2063, 3962.0083, 1566.1392, 4055.4233, 4167.3813, 1885.7001, 4026.2537, 4094.6467, 1413.5938, 3.5527928]
2025-09-12 13:29:50,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [638.0, 1000.0, 396.0, 1000.0, 1000.0, 491.0, 1000.0, 1000.0, 397.0, 14.0]
2025-09-12 13:29:50,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 56 minutes, 13 seconds)
2025-09-12 13:41:36,585 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:41:36,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:45:14,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3229.73193 ± 1603.112
2025-09-12 13:45:14,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3952.7695, 3867.5515, 4118.574, 4154.241, 3937.6738, 36.117886, 4175.465, 36.20018, 3768.2537, 4250.471]
2025-09-12 13:45:14,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 45.0, 1000.0, 43.0, 1000.0, 1000.0]
2025-09-12 13:45:14,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 43 minutes, 52 seconds)
2025-09-12 13:56:54,487 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:56:54,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:00:44,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3362.11523 ± 1215.797
2025-09-12 14:00:44,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3449.2427, 4106.179, 3937.617, 3804.7004, 4244.1777, 3919.7336, 3888.2605, 2193.3225, 107.64396, 3970.2734]
2025-09-12 14:00:44,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [837.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 620.0, 95.0, 1000.0]
2025-09-12 14:00:44,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 31 minutes, 49 seconds)
2025-09-12 14:12:12,107 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:12:12,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:15:48,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3120.53857 ± 1471.597
2025-09-12 14:15:48,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [2475.8555, 655.398, 3982.9482, 16.49715, 3999.9114, 3980.0183, 4013.9255, 3996.7651, 4106.8135, 3977.252]
2025-09-12 14:15:48,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [714.0, 217.0, 1000.0, 27.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:15:48,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 14 minutes, 52 seconds)
2025-09-12 14:27:33,641 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:27:33,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:31:12,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3156.74487 ± 967.918
2025-09-12 14:31:12,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3447.516, 3742.3025, 4044.0815, 4135.7476, 1304.4933, 3062.4836, 2291.8655, 1765.3639, 3940.5664, 3833.0283]
2025-09-12 14:31:12,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [857.0, 1000.0, 1000.0, 1000.0, 384.0, 814.0, 646.0, 501.0, 1000.0, 1000.0]
2025-09-12 14:31:12,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 3 minutes, 37 seconds)
2025-09-12 14:42:20,859 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:42:20,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:44:53,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2287.04419 ± 1809.505
2025-09-12 14:44:53,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [4413.0503, 4361.185, 1411.6572, 14.569931, 2021.4288, 1745.6302, 4461.9146, 4177.5815, 257.9634, 5.4600487]
2025-09-12 14:44:53,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 423.0, 24.0, 584.0, 466.0, 1000.0, 1000.0, 93.0, 17.0]
2025-09-12 14:44:53,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 45 minutes, 8 seconds)
2025-09-12 14:56:32,273 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:56:32,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:00:10,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3241.12988 ± 1484.683
2025-09-12 15:00:10,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [4145.179, 2098.7207, 1177.7993, 4059.7544, 4048.1511, 4290.6875, 4249.313, 4300.3423, 4035.9563, 5.394329]
2025-09-12 15:00:10,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 549.0, 346.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 20.0]
2025-09-12 15:00:10,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 29 minutes, 45 seconds)
2025-09-12 15:10:43,804 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:10:43,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:15:18,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 4093.41479 ± 112.115
2025-09-12 15:15:18,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [4189.6714, 3897.896, 4164.8887, 4140.1167, 4098.9746, 3987.555, 4013.7527, 4206.2363, 4257.9546, 3977.1003]
2025-09-12 15:15:18,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:15:18,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (4093.41) for latency ExtremeClogL1U23
2025-09-12 15:15:18,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 13 minutes, 30 seconds)
2025-09-12 15:26:45,829 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:26:45,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:29:28,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2350.60767 ± 1792.472
2025-09-12 15:29:28,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [4203.71, 4055.7905, 10.805238, 4023.7307, 1428.556, 160.72966, 4146.0615, 4036.5083, 227.602, 1212.5842]
2025-09-12 15:29:28,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 23.0, 1000.0, 379.0, 94.0, 1000.0, 1000.0, 100.0, 390.0]
2025-09-12 15:29:28,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 55 minutes, 43 seconds)
2025-09-12 15:40:55,676 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:40:55,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:44:20,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3009.12524 ± 1472.848
2025-09-12 15:44:20,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [4128.4746, 102.142426, 2501.9614, 2226.3557, 4061.0383, 4017.8694, 4020.43, 4205.7446, 4121.1577, 706.0773]
2025-09-12 15:44:20,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 65.0, 681.0, 558.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 229.0]
2025-09-12 15:44:21,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 39 minutes, 24 seconds)
2025-09-12 15:55:59,208 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:55:59,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:59:44,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3342.25537 ± 1305.023
2025-09-12 15:59:44,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [4121.649, 2152.7344, 4191.146, 4186.348, 4071.7148, 3996.929, 4364.263, 3983.5054, 2121.4282, 232.8354]
2025-09-12 15:59:44,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 573.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 572.0, 128.0]
2025-09-12 15:59:44,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 29 minutes, 34 seconds)
2025-09-12 16:11:04,131 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:11:04,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:14:28,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3033.16626 ± 1395.412
2025-09-12 16:14:28,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [180.0949, 1670.6558, 4164.911, 2549.941, 4005.984, 4062.1643, 1423.6348, 4039.6294, 4091.0928, 4143.5522]
2025-09-12 16:14:28,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [86.0, 479.0, 1000.0, 651.0, 1000.0, 1000.0, 406.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:14:28,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 13 minutes, 9 seconds)
2025-09-12 16:25:27,382 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:25:27,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:28:15,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2440.25684 ± 1778.725
2025-09-12 16:28:15,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [66.61404, 2796.453, 4267.247, 892.83453, 206.69092, 174.32855, 4301.4404, 3895.3687, 4195.928, 3605.6638]
2025-09-12 16:28:15,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [61.0, 659.0, 1000.0, 298.0, 97.0, 89.0, 1000.0, 995.0, 1000.0, 1000.0]
2025-09-12 16:28:15,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 55 minutes, 5 seconds)
2025-09-12 16:40:24,228 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:40:24,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:44:02,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3097.34912 ± 1368.913
2025-09-12 16:44:02,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [650.86096, 3657.5696, 4129.332, 1960.1072, 4017.192, 4009.39, 3914.702, 3865.7524, 633.8024, 4134.7817]
2025-09-12 16:44:02,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [228.0, 1000.0, 1000.0, 560.0, 1000.0, 1000.0, 1000.0, 1000.0, 228.0, 1000.0]
2025-09-12 16:44:02,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 44 minutes, 2 seconds)
2025-09-12 16:55:04,503 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:55:04,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:57:45,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2320.00122 ± 1800.995
2025-09-12 16:57:45,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [249.35027, 6.2711334, 680.2812, 500.1117, 4093.4458, 1304.769, 4019.8608, 3980.6433, 4055.5593, 4309.7217]
2025-09-12 16:57:45,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [98.0, 18.0, 242.0, 193.0, 1000.0, 355.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:57:45,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 26 minutes, 48 seconds)
2025-09-12 17:09:20,729 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:09:20,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:12:19,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2614.57178 ± 1815.896
2025-09-12 17:12:19,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [30.363476, 4086.3604, 4267.2314, 4111.124, 4061.4758, 4007.403, 3904.6326, 387.22855, 1253.9921, 35.905308]
2025-09-12 17:12:19,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [37.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 993.0, 137.0, 425.0, 55.0]
2025-09-12 17:12:19,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 10 minutes, 40 seconds)
2025-09-12 17:23:36,343 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:23:36,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:26:20,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2421.50366 ± 1743.464
2025-09-12 17:26:20,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [4018.7668, 3105.9255, 4321.3965, 4129.954, 2773.903, 5.406271, 82.542465, 4329.0635, 638.6869, 809.3933]
2025-09-12 17:26:20,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 746.0, 1000.0, 1000.0, 722.0, 20.0, 64.0, 1000.0, 202.0, 258.0]
2025-09-12 17:26:20,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 54 minutes, 59 seconds)
2025-09-12 17:37:46,716 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:37:46,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:41:14,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 2921.94312 ± 1480.259
2025-09-12 17:41:14,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1590.0303, 483.79483, 3937.4705, 101.25996, 3775.076, 3869.2012, 3956.0251, 3802.488, 3904.806, 3799.2766]
2025-09-12 17:41:14,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [460.0, 220.0, 1000.0, 71.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:41:14,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 42 minutes, 10 seconds)
2025-09-12 17:52:24,284 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:52:24,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:56:09,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3343.63037 ± 1375.708
2025-09-12 17:56:09,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [3805.845, 4215.575, 40.532593, 1499.1436, 4191.3877, 4233.0977, 3800.5862, 4341.4263, 3049.4392, 4259.2725]
2025-09-12 17:56:09,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 53.0, 421.0, 1000.0, 1000.0, 1000.0, 1000.0, 806.0, 1000.0]
2025-09-12 17:56:09,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 26 minutes, 32 seconds)
2025-09-12 18:08:15,381 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:08:15,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:12:37,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 4112.42334 ± 617.714
2025-09-12 18:12:37,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [4589.552, 4202.0215, 4304.484, 4216.16, 4294.1465, 2297.8616, 4279.139, 4426.6074, 4118.674, 4395.585]
2025-09-12 18:12:37,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 603.0, 1000.0, 1000.0, 969.0, 1000.0]
2025-09-12 18:12:37,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (4112.42) for latency ExtremeClogL1U23
2025-09-12 18:12:37,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 14 minutes, 52 seconds)
2025-09-12 18:23:30,610 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:23:30,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:27:49,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3824.03174 ± 625.399
2025-09-12 18:27:49,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [4165.2266, 2152.4314, 4099.653, 4137.699, 4008.4285, 4149.2944, 3976.4707, 4091.4607, 4265.5107, 3194.1414]
2025-09-12 18:27:49,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 588.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 758.0]
2025-09-12 18:27:49,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 23 seconds)
2025-09-12 18:39:21,485 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:39:21,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:43:14,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3433.26709 ± 1273.301
2025-09-12 18:43:14,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1435.9336, 4067.176, 4276.9736, 3836.0027, 4008.0232, 3914.9937, 4025.3699, 4239.499, 4088.082, 440.61725]
2025-09-12 18:43:14,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [446.0, 1000.0, 1000.0, 1000.0, 875.0, 1000.0, 1000.0, 1000.0, 1000.0, 154.0]
2025-09-12 18:43:14,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 46 minutes, 7 seconds)
2025-09-12 18:55:18,550 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:55:18,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:58:49,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3235.27637 ± 1529.923
2025-09-12 18:58:49,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [4071.7458, 1221.6447, 116.14899, 4253.7476, 4252.5176, 1559.392, 3879.3472, 4307.688, 4427.686, 4262.845]
2025-09-12 18:58:49,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [949.0, 377.0, 66.0, 1000.0, 1000.0, 397.0, 932.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:58:49,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 31 minutes, 2 seconds)
2025-09-12 19:10:04,798 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:10:04,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:14:40,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 4295.73047 ± 128.363
2025-09-12 19:14:40,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [4444.1313, 4354.9536, 4373.997, 4026.946, 4445.4004, 4266.5776, 4269.8403, 4295.9517, 4363.614, 4115.8945]
2025-09-12 19:14:40,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:14:40,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (4295.73) for latency ExtremeClogL1U23
2025-09-12 19:14:40,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 15 minutes, 42 seconds)
2025-09-12 19:27:06,296 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:27:06,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:30:52,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 3491.39844 ± 1419.848
2025-09-12 19:30:52,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [4228.366, 4245.3394, 1521.9698, 4026.159, 4274.33, 4435.4795, 3654.7842, 9.224373, 4104.1387, 4414.194]
2025-09-12 19:30:52,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 393.0, 1000.0, 1000.0, 1000.0, 836.0, 19.0, 1000.0, 1000.0]
2025-09-12 19:30:52,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1251 [DEBUG]: Training session finished
