2025-09-11 19:35:00,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc5-walker2d/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 19:35:00,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc5-walker2d/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 19:35:00,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x148cc6a35550>}
2025-09-11 19:35:00,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1111 [DEBUG]: using device: cuda
2025-09-11 19:35:00,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1133 [INFO]: Creating new trainer
2025-09-11 19:35:00,128 baseline-mbpac-noiseperc5-walker2d:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-11 19:35:00,129 baseline-mbpac-noiseperc5-walker2d:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 19:35:00,136 baseline-mbpac-noiseperc5-walker2d:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-11 19:35:01,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1194 [DEBUG]: Starting training session...
2025-09-11 19:35:01,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 1/100
2025-09-11 19:45:17,898 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:45:17,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:48:01,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 656.04504 ± 369.095
2025-09-11 19:48:01,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [305.8449, 1044.9625, 328.8519, 269.186, 299.56912, 236.08069, 994.07263, 1032.3403, 1009.333, 1040.2095]
2025-09-11 19:48:01,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [173.0, 1000.0, 207.0, 149.0, 173.0, 126.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 19:48:01,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (656.05) for latency ExtremeClogL1U23
2025-09-11 19:48:01,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 21 hours, 27 minutes, 19 seconds)
2025-09-11 19:59:44,925 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:59:44,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:00:18,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3.43157 ± 27.262
2025-09-11 20:00:18,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [-8.547316, 4.648828, 37.600994, 8.296912, 10.317875, -19.162788, -13.352816, -44.865204, 57.480804, 1.8983521]
2025-09-11 20:00:18,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [116.0, 71.0, 124.0, 143.0, 155.0, 93.0, 149.0, 118.0, 94.0, 126.0]
2025-09-11 20:00:18,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 20 hours, 38 minutes, 49 seconds)
2025-09-11 20:12:06,613 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:12:06,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:13:11,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 297.59317 ± 175.443
2025-09-11 20:13:11,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [269.83426, 284.89743, 454.92847, 76.43961, 171.0702, 142.39375, 412.39572, 610.6722, 75.35698, 477.94278]
2025-09-11 20:13:11,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [176.0, 242.0, 259.0, 209.0, 217.0, 120.0, 282.0, 518.0, 122.0, 207.0]
2025-09-11 20:13:11,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 20 hours, 34 minutes, 8 seconds)
2025-09-11 20:24:35,504 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:24:35,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:25:59,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 145.62447 ± 94.876
2025-09-11 20:25:59,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [11.453817, 276.47546, 324.83136, 95.19908, 119.3028, 183.73778, 60.529305, 127.64714, 199.03041, 58.03745]
2025-09-11 20:25:59,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [123.0, 375.0, 533.0, 252.0, 216.0, 402.0, 174.0, 329.0, 368.0, 226.0]
2025-09-11 20:25:59,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 20 hours, 23 minutes, 6 seconds)
2025-09-11 20:37:40,315 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:37:40,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:38:48,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 328.09528 ± 204.619
2025-09-11 20:38:48,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [458.50354, 83.91211, 18.005419, 282.21967, 304.43036, 769.4463, 456.56607, 395.18893, 163.7571, 348.92343]
2025-09-11 20:38:48,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [259.0, 108.0, 27.0, 295.0, 187.0, 664.0, 328.0, 240.0, 169.0, 187.0]
2025-09-11 20:38:48,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 20 hours, 12 minutes, 1 second)
2025-09-11 20:51:02,289 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:51:02,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:52:37,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 282.22900 ± 135.349
2025-09-11 20:52:37,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [397.0811, 27.434269, 353.9747, 35.434616, 222.63635, 283.6982, 381.52493, 393.81342, 364.63022, 362.06213]
2025-09-11 20:52:37,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [227.0, 103.0, 228.0, 151.0, 424.0, 579.0, 297.0, 403.0, 712.0, 278.0]
2025-09-11 20:52:37,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 20 hours, 14 minutes, 22 seconds)
2025-09-11 21:03:34,454 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:03:34,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:04:39,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 384.67709 ± 166.234
2025-09-11 21:04:39,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [716.5079, 573.40424, 416.91428, 356.96744, 349.33633, 60.74718, 378.73615, 301.9527, 426.0022, 266.20248]
2025-09-11 21:04:39,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [339.0, 301.0, 243.0, 225.0, 206.0, 146.0, 259.0, 193.0, 262.0, 174.0]
2025-09-11 21:04:39,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 19 hours, 57 minutes, 3 seconds)
2025-09-11 21:16:32,944 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:16:32,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:18:42,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 539.69489 ± 238.705
2025-09-11 21:18:42,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [526.6368, 430.89987, 966.3595, 1009.48096, 411.55945, 427.8227, 518.1614, 440.10913, 209.95465, 455.96475]
2025-09-11 21:18:42,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [283.0, 317.0, 1000.0, 1000.0, 853.0, 215.0, 272.0, 247.0, 249.0, 238.0]
2025-09-11 21:18:42,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 20 hours, 5 minutes, 31 seconds)
2025-09-11 21:30:16,199 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:30:16,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:31:06,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 313.23404 ± 118.069
2025-09-11 21:31:06,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [161.82977, 323.4336, 458.84082, 325.23804, 458.60178, 209.6513, 402.01196, 86.2311, 336.5869, 369.91507]
2025-09-11 21:31:06,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [105.0, 189.0, 208.0, 185.0, 289.0, 140.0, 204.0, 132.0, 159.0, 193.0]
2025-09-11 21:31:06,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 19 hours, 45 minutes, 11 seconds)
2025-09-11 21:42:28,716 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:42:28,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:43:27,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 383.57159 ± 126.148
2025-09-11 21:43:27,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [540.18427, 382.37042, 232.9503, 645.25696, 442.5417, 231.00397, 382.7506, 263.80942, 335.36935, 379.4787]
2025-09-11 21:43:27,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [265.0, 187.0, 202.0, 272.0, 178.0, 187.0, 297.0, 163.0, 187.0, 191.0]
2025-09-11 21:43:27,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 19 hours, 23 minutes, 36 seconds)
2025-09-11 21:55:04,977 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:55:04,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:56:12,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 481.85333 ± 137.538
2025-09-11 21:56:12,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [582.58606, 292.64804, 455.38635, 787.84296, 377.44418, 549.40826, 526.4779, 429.10104, 313.99023, 503.64856]
2025-09-11 21:56:12,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [399.0, 142.0, 198.0, 338.0, 185.0, 265.0, 223.0, 221.0, 190.0, 265.0]
2025-09-11 21:56:12,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 18 hours, 51 minutes, 59 seconds)
2025-09-11 22:07:43,709 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:07:43,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:08:24,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 312.31110 ± 145.575
2025-09-11 22:08:24,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [31.819742, 447.05237, 253.4603, 154.70859, 422.3242, 240.35307, 526.33374, 272.2032, 313.91605, 460.93988]
2025-09-11 22:08:24,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [55.0, 193.0, 128.0, 83.0, 186.0, 119.0, 211.0, 137.0, 152.0, 232.0]
2025-09-11 22:08:25,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 18 hours, 42 minutes, 3 seconds)
2025-09-11 22:20:12,162 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:20:12,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:20:59,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 316.92947 ± 135.712
2025-09-11 22:20:59,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [90.62572, 517.855, 204.21606, 435.7066, 442.52808, 410.19748, 391.6099, 300.60153, 221.18965, 154.76474]
2025-09-11 22:20:59,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [76.0, 215.0, 111.0, 233.0, 273.0, 239.0, 214.0, 161.0, 118.0, 85.0]
2025-09-11 22:20:59,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 18 hours, 3 minutes, 40 seconds)
2025-09-11 22:32:27,563 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:32:27,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:33:44,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 399.27667 ± 191.446
2025-09-11 22:33:44,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [63.150135, 314.208, 385.55304, 228.42398, 299.7878, 484.23898, 588.5241, 730.097, 606.74695, 292.0366]
2025-09-11 22:33:44,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [153.0, 189.0, 203.0, 312.0, 185.0, 248.0, 364.0, 318.0, 627.0, 154.0]
2025-09-11 22:33:44,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 17 hours, 57 minutes, 15 seconds)
2025-09-11 22:45:55,789 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:45:55,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:47:28,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 497.51431 ± 182.570
2025-09-11 22:47:28,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [651.837, 407.77066, 283.97763, 394.95218, 407.0718, 869.1092, 351.95755, 369.24847, 745.3402, 493.8785]
2025-09-11 22:47:28,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [555.0, 294.0, 137.0, 216.0, 246.0, 519.0, 180.0, 220.0, 365.0, 249.0]
2025-09-11 22:47:28,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 18 hours, 8 minutes, 16 seconds)
2025-09-11 23:00:30,078 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:00:30,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:01:35,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 416.73810 ± 86.949
2025-09-11 23:01:35,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [306.01157, 351.38678, 439.11807, 448.2991, 313.43784, 357.32166, 357.38657, 521.0012, 526.9175, 546.50024]
2025-09-11 23:01:35,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [178.0, 169.0, 191.0, 249.0, 158.0, 188.0, 207.0, 260.0, 272.0, 309.0]
2025-09-11 23:01:35,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 18 hours, 18 minutes, 24 seconds)
2025-09-11 23:14:08,058 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:14:08,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:15:10,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 506.55933 ± 167.556
2025-09-11 23:15:10,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [647.0002, 349.568, 870.80914, 520.13904, 435.66382, 426.05157, 687.27454, 427.37305, 302.9178, 398.79608]
2025-09-11 23:15:10,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [258.0, 149.0, 397.0, 206.0, 151.0, 181.0, 234.0, 189.0, 128.0, 170.0]
2025-09-11 23:15:10,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 18 hours, 28 minutes, 3 seconds)
2025-09-11 23:27:51,725 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:27:51,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:29:16,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 523.22668 ± 230.081
2025-09-11 23:29:16,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [624.2653, 418.4702, 369.72858, 494.91003, 328.76312, 346.0774, 551.24133, 1148.7078, 380.2454, 569.8572]
2025-09-11 23:29:16,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [353.0, 212.0, 195.0, 198.0, 177.0, 195.0, 328.0, 710.0, 198.0, 285.0]
2025-09-11 23:29:16,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 18 hours, 39 minutes, 58 seconds)
2025-09-11 23:41:45,936 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:41:45,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:43:13,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 645.85028 ± 246.712
2025-09-11 23:43:13,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [974.0249, 490.2671, 951.25006, 535.934, 1071.6726, 370.10983, 346.30502, 514.554, 642.68445, 561.70074]
2025-09-11 23:43:13,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [489.0, 259.0, 372.0, 304.0, 415.0, 205.0, 169.0, 262.0, 234.0, 215.0]
2025-09-11 23:43:13,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 18 hours, 45 minutes, 44 seconds)
2025-09-11 23:55:36,900 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:55:36,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:57:03,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 685.47266 ± 253.688
2025-09-11 23:57:03,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1049.3619, 454.16876, 893.4888, 848.4896, 1041.2949, 703.0672, 663.214, 489.44287, 306.11572, 406.08258]
2025-09-11 23:57:03,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [406.0, 198.0, 324.0, 289.0, 345.0, 372.0, 412.0, 192.0, 138.0, 219.0]
2025-09-11 23:57:03,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (685.47) for latency ExtremeClogL1U23
2025-09-11 23:57:03,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 18 hours, 33 minutes, 21 seconds)
2025-09-12 00:09:46,766 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:09:46,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:11:23,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 781.66461 ± 128.654
2025-09-12 00:11:23,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [875.58606, 908.08795, 542.1598, 751.96716, 679.5757, 870.67334, 787.86554, 902.5372, 595.2198, 902.97296]
2025-09-12 00:11:23,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [310.0, 327.0, 278.0, 314.0, 249.0, 389.0, 296.0, 337.0, 252.0, 450.0]
2025-09-12 00:11:23,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (781.66) for latency ExtremeClogL1U23
2025-09-12 00:11:23,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 18 hours, 22 minutes, 37 seconds)
2025-09-12 00:23:55,274 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:23:55,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:25:35,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 870.95203 ± 144.293
2025-09-12 00:25:35,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1127.3398, 778.78937, 849.48303, 898.497, 743.2525, 1014.06244, 722.62616, 1077.7, 722.21735, 775.5536]
2025-09-12 00:25:35,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [386.0, 344.0, 326.0, 337.0, 301.0, 391.0, 277.0, 425.0, 255.0, 328.0]
2025-09-12 00:25:35,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (870.95) for latency ExtremeClogL1U23
2025-09-12 00:25:35,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 18 hours, 18 minutes, 44 seconds)
2025-09-12 00:38:23,572 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:38:23,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:39:55,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 706.53937 ± 335.509
2025-09-12 00:39:55,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1084.0054, 1165.6552, 766.6238, 693.2819, 915.0429, 841.41003, 253.60184, 862.2167, 392.69617, 90.85962]
2025-09-12 00:39:55,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [557.0, 454.0, 290.0, 254.0, 362.0, 331.0, 157.0, 310.0, 219.0, 102.0]
2025-09-12 00:39:55,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 18 hours, 7 minutes, 56 seconds)
2025-09-12 00:52:40,595 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:52:40,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:54:06,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 717.39563 ± 79.404
2025-09-12 00:54:06,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [847.14874, 720.14404, 740.7666, 662.8073, 690.0844, 831.42303, 743.8958, 702.1051, 553.94165, 681.6398]
2025-09-12 00:54:06,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [334.0, 299.0, 282.0, 282.0, 262.0, 322.0, 305.0, 269.0, 203.0, 295.0]
2025-09-12 00:54:06,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 17 hours, 57 minutes, 17 seconds)
2025-09-12 01:06:26,230 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:06:26,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:08:38,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1175.64929 ± 645.386
2025-09-12 01:08:38,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1262.3132, 1387.5468, 624.54846, 2447.1096, 1074.0022, 776.4417, 1985.9048, 1236.2229, 35.052998, 927.3503]
2025-09-12 01:08:38,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [464.0, 523.0, 287.0, 810.0, 387.0, 303.0, 716.0, 461.0, 54.0, 351.0]
2025-09-12 01:08:38,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (1175.65) for latency ExtremeClogL1U23
2025-09-12 01:08:38,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 17 hours, 53 minutes, 40 seconds)
2025-09-12 01:21:23,777 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:21:23,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:22:54,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 757.63110 ± 255.309
2025-09-12 01:22:54,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [891.8849, 323.2001, 913.5231, 257.26364, 854.244, 821.2403, 855.633, 635.80707, 1053.0948, 970.4205]
2025-09-12 01:22:54,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [348.0, 160.0, 329.0, 142.0, 312.0, 280.0, 333.0, 281.0, 415.0, 404.0]
2025-09-12 01:22:54,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 17 hours, 38 minutes, 38 seconds)
2025-09-12 01:35:57,219 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:35:57,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:37:03,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 614.95087 ± 184.708
2025-09-12 01:37:03,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [677.0545, 610.98895, 736.85175, 715.7591, 697.77747, 613.8621, 621.9302, 675.45264, 723.3546, 76.47746]
2025-09-12 01:37:03,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [242.0, 224.0, 264.0, 251.0, 243.0, 225.0, 227.0, 235.0, 245.0, 70.0]
2025-09-12 01:37:03,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 17 hours, 23 minutes, 24 seconds)
2025-09-12 01:49:41,955 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:49:41,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:51:10,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 745.19751 ± 294.902
2025-09-12 01:51:10,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [917.40576, 850.9917, 737.4757, 113.023865, 710.347, 829.78485, 1256.5509, 898.9865, 760.1697, 377.23856]
2025-09-12 01:51:10,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [365.0, 324.0, 269.0, 137.0, 290.0, 317.0, 430.0, 354.0, 276.0, 174.0]
2025-09-12 01:51:10,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 17 hours, 5 minutes, 54 seconds)
2025-09-12 02:03:43,816 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:03:43,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:05:25,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 933.76208 ± 223.621
2025-09-12 02:05:25,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [800.46783, 966.7026, 977.95984, 767.19574, 1002.34485, 717.0684, 1019.09784, 1522.2584, 769.2332, 795.29224]
2025-09-12 02:05:25,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [305.0, 369.0, 323.0, 316.0, 338.0, 279.0, 379.0, 504.0, 264.0, 292.0]
2025-09-12 02:05:25,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 16 hours, 52 minutes, 47 seconds)
2025-09-12 02:18:00,886 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:18:00,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:19:48,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 958.56311 ± 155.346
2025-09-12 02:19:48,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1044.3889, 1274.8583, 876.84735, 893.27856, 994.1338, 821.2624, 784.339, 1063.9264, 1086.341, 746.25494]
2025-09-12 02:19:48,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [439.0, 465.0, 315.0, 313.0, 350.0, 292.0, 323.0, 364.0, 417.0, 292.0]
2025-09-12 02:19:48,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 16 hours, 36 minutes, 19 seconds)
2025-09-12 02:32:41,072 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:32:41,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:34:40,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1062.43726 ± 211.460
2025-09-12 02:34:40,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [863.8566, 894.7487, 1355.1788, 809.3251, 1051.9694, 1266.1805, 1417.8955, 1162.095, 910.7785, 892.3444]
2025-09-12 02:34:40,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [372.0, 267.0, 466.0, 343.0, 395.0, 480.0, 517.0, 428.0, 334.0, 330.0]
2025-09-12 02:34:40,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 16 hours, 30 minutes, 17 seconds)
2025-09-12 02:47:08,020 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:47:08,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:48:56,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1028.25574 ± 419.766
2025-09-12 02:48:56,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [855.27185, 813.2232, 1121.7601, 744.72974, 1069.245, 747.22534, 856.82635, 1734.5881, 481.47632, 1858.2115]
2025-09-12 02:48:56,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [295.0, 293.0, 366.0, 277.0, 372.0, 305.0, 323.0, 561.0, 215.0, 585.0]
2025-09-12 02:48:56,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 16 hours, 17 minutes, 25 seconds)
2025-09-12 03:01:29,422 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:01:29,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:03:03,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 743.59686 ± 137.185
2025-09-12 03:03:03,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [603.09454, 683.2924, 790.5207, 747.8894, 836.6327, 1014.0285, 533.1691, 897.09094, 647.02014, 683.2305]
2025-09-12 03:03:03,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [248.0, 311.0, 290.0, 266.0, 317.0, 374.0, 229.0, 515.0, 269.0, 256.0]
2025-09-12 03:03:03,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 16 hours, 3 minutes, 10 seconds)
2025-09-12 03:15:43,281 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:15:43,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:17:04,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 719.36261 ± 236.074
2025-09-12 03:17:04,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [923.45996, 557.93964, 873.0371, 877.6119, 782.9034, 606.2924, 879.9355, 183.50839, 521.3097, 987.62805]
2025-09-12 03:17:04,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [328.0, 187.0, 316.0, 324.0, 280.0, 232.0, 314.0, 117.0, 220.0, 356.0]
2025-09-12 03:17:04,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 15 hours, 45 minutes, 44 seconds)
2025-09-12 03:30:50,258 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:30:50,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:33:09,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1373.54114 ± 741.167
2025-09-12 03:33:09,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1164.9055, 1079.6179, 1065.4916, 710.13043, 1574.8497, 2846.875, 991.3998, 2491.1433, 1531.5693, 279.42825]
2025-09-12 03:33:09,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [383.0, 376.0, 373.0, 230.0, 524.0, 904.0, 328.0, 841.0, 506.0, 165.0]
2025-09-12 03:33:09,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (1373.54) for latency ExtremeClogL1U23
2025-09-12 03:33:09,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 15 hours, 53 minutes, 37 seconds)
2025-09-12 03:45:14,961 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:45:14,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:47:30,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1425.84473 ± 497.031
2025-09-12 03:47:30,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [2123.229, 1634.0505, 1551.2966, 2059.319, 1778.3892, 1190.1046, 1021.67035, 980.9934, 436.2597, 1483.1345]
2025-09-12 03:47:30,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [639.0, 509.0, 459.0, 661.0, 513.0, 404.0, 317.0, 308.0, 171.0, 490.0]
2025-09-12 03:47:30,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (1425.84) for latency ExtremeClogL1U23
2025-09-12 03:47:30,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 15 hours, 32 minutes, 12 seconds)
2025-09-12 04:00:26,360 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:00:26,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:03:06,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1520.01660 ± 1011.204
2025-09-12 04:03:06,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [953.1903, 1467.8887, 636.91766, 620.9113, 3047.737, 1289.8857, 800.1842, 512.7382, 2495.4536, 3375.2603]
2025-09-12 04:03:06,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [324.0, 524.0, 245.0, 246.0, 1000.0, 444.0, 301.0, 283.0, 871.0, 1000.0]
2025-09-12 04:03:06,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (1520.02) for latency ExtremeClogL1U23
2025-09-12 04:03:06,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 15 hours, 34 minutes, 35 seconds)
2025-09-12 04:15:12,733 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:15:12,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:18:18,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1939.51526 ± 925.311
2025-09-12 04:18:18,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3259.8657, 1673.8756, 939.77606, 1938.7174, 2183.8005, 2456.9458, 496.08835, 1657.556, 3586.876, 1201.6525]
2025-09-12 04:18:18,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 532.0, 359.0, 594.0, 716.0, 740.0, 187.0, 520.0, 1000.0, 437.0]
2025-09-12 04:18:18,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (1939.52) for latency ExtremeClogL1U23
2025-09-12 04:18:18,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 15 hours, 33 minutes, 7 seconds)
2025-09-12 04:31:32,922 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:31:32,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:35:07,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2187.94336 ± 1038.342
2025-09-12 04:35:07,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3169.3333, 1110.9762, 3151.5737, 1677.4896, 3179.3774, 1804.5082, 524.71027, 2933.4282, 935.9682, 3392.0667]
2025-09-12 04:35:07,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 350.0, 1000.0, 566.0, 1000.0, 533.0, 272.0, 1000.0, 301.0, 1000.0]
2025-09-12 04:35:07,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (2187.94) for latency ExtremeClogL1U23
2025-09-12 04:35:07,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 15 hours, 52 minutes, 11 seconds)
2025-09-12 04:48:02,387 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:48:02,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:51:09,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1997.28577 ± 1008.466
2025-09-12 04:51:09,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1580.3927, 2970.05, 2298.0679, 1891.8286, 2858.2964, 3206.2073, 3171.7034, 517.5867, 816.9395, 661.786]
2025-09-12 04:51:09,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [506.0, 917.0, 698.0, 602.0, 830.0, 1000.0, 1000.0, 185.0, 256.0, 245.0]
2025-09-12 04:51:09,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 15 hours, 35 minutes, 54 seconds)
2025-09-12 05:03:23,575 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:03:23,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:06:14,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 1914.25842 ± 957.350
2025-09-12 05:06:14,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1401.6434, 1060.5704, 1318.0269, 2173.6868, 1788.6826, 3406.2642, 1393.2698, 2515.689, 3598.4934, 486.2572]
2025-09-12 05:06:14,728 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [433.0, 338.0, 424.0, 634.0, 547.0, 1000.0, 418.0, 734.0, 1000.0, 189.0]
2025-09-12 05:06:14,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 15 hours, 29 minutes, 9 seconds)
2025-09-12 05:19:00,423 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:19:00,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:22:41,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2393.00732 ± 1046.441
2025-09-12 05:22:41,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [817.8981, 3193.0283, 1314.0322, 3283.8525, 3376.6333, 2576.0596, 544.42236, 2369.9548, 3051.9607, 3402.2334]
2025-09-12 05:22:41,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [273.0, 1000.0, 425.0, 1000.0, 1000.0, 802.0, 208.0, 668.0, 893.0, 1000.0]
2025-09-12 05:22:41,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (2393.01) for latency ExtremeClogL1U23
2025-09-12 05:22:41,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 15 hours, 23 minutes, 14 seconds)
2025-09-12 05:35:24,698 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:35:24,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:39:10,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2594.14648 ± 1166.087
2025-09-12 05:39:10,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1886.2078, 464.32803, 3597.5771, 674.2074, 2235.394, 3684.3079, 3511.6367, 3014.8264, 3673.401, 3199.581]
2025-09-12 05:39:10,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [552.0, 187.0, 1000.0, 227.0, 670.0, 1000.0, 1000.0, 879.0, 1000.0, 941.0]
2025-09-12 05:39:10,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (2594.15) for latency ExtremeClogL1U23
2025-09-12 05:39:10,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 15 hours, 21 minutes, 52 seconds)
2025-09-12 05:52:00,148 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:52:00,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:55:43,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2467.57959 ± 834.343
2025-09-12 05:55:43,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3556.1375, 1725.2026, 1972.3915, 3334.8706, 3152.0515, 3160.7642, 2946.5347, 2334.177, 1086.9075, 1406.7601]
2025-09-12 05:55:43,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 603.0, 543.0, 1000.0, 912.0, 1000.0, 860.0, 697.0, 380.0, 405.0]
2025-09-12 05:55:43,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 15 hours, 2 minutes, 48 seconds)
2025-09-12 06:08:26,916 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:08:26,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:11:49,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2245.64209 ± 1238.020
2025-09-12 06:11:49,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1701.7993, 1560.2926, 121.568794, 3361.3586, 3527.4827, 714.5305, 3397.359, 3267.5342, 3497.1821, 1307.3146]
2025-09-12 06:11:49,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [481.0, 494.0, 129.0, 1000.0, 1000.0, 221.0, 1000.0, 1000.0, 1000.0, 420.0]
2025-09-12 06:11:49,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 14 hours, 47 minutes, 21 seconds)
2025-09-12 06:24:11,103 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:24:11,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:27:28,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2034.30664 ± 1184.790
2025-09-12 06:27:28,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [2452.8022, 3164.4558, 810.57465, 3262.901, 2528.7969, 218.65546, 3075.983, 1392.0741, 245.37373, 3191.4507]
2025-09-12 06:27:28,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [759.0, 1000.0, 261.0, 1000.0, 773.0, 125.0, 1000.0, 508.0, 96.0, 1000.0]
2025-09-12 06:27:28,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 14 hours, 37 minutes, 15 seconds)
2025-09-12 06:40:46,639 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:40:46,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:45:22,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3153.70605 ± 520.486
2025-09-12 06:45:22,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3514.062, 3589.1611, 2927.2888, 2480.445, 3568.481, 3428.6504, 1962.8594, 3460.1091, 3504.03, 3101.9746]
2025-09-12 06:45:22,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 851.0, 702.0, 1000.0, 1000.0, 591.0, 1000.0, 1000.0, 912.0]
2025-09-12 06:45:22,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (3153.71) for latency ExtremeClogL1U23
2025-09-12 06:45:22,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 14 hours, 36 minutes, 27 seconds)
2025-09-12 06:57:51,891 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:57:51,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:01:39,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2676.38770 ± 1142.479
2025-09-12 07:01:39,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1630.872, 2923.7393, 3710.0342, 3571.0293, 3784.584, 1451.6581, 3615.2122, 1938.8826, 466.1062, 3671.7598]
2025-09-12 07:01:39,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [468.0, 780.0, 1000.0, 1000.0, 1000.0, 433.0, 1000.0, 574.0, 240.0, 1000.0]
2025-09-12 07:01:39,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 14 hours, 17 minutes, 52 seconds)
2025-09-12 07:13:38,456 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:13:38,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:18:17,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3116.31494 ± 545.725
2025-09-12 07:18:17,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3487.4292, 3402.1533, 3137.323, 3396.457, 1694.7676, 3414.6077, 3218.9753, 2527.1995, 3450.6848, 3433.5486]
2025-09-12 07:18:17,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 518.0, 1000.0, 1000.0, 783.0, 1000.0, 1000.0]
2025-09-12 07:18:17,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 14 hours, 2 minutes, 3 seconds)
2025-09-12 07:31:02,969 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:31:02,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:36:00,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3473.00781 ± 163.393
2025-09-12 07:36:00,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3572.0286, 3644.6724, 3117.4565, 3401.5295, 3636.1147, 3519.6567, 3623.1885, 3321.522, 3548.283, 3345.6267]
2025-09-12 07:36:00,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 887.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:36:00,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (3473.01) for latency ExtremeClogL1U23
2025-09-12 07:36:00,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 14 hours, 1 minute, 54 seconds)
2025-09-12 07:49:46,250 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:49:46,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:54:25,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3385.64062 ± 510.343
2025-09-12 07:54:25,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3680.306, 3572.2917, 3541.086, 3610.2815, 2178.2656, 2600.5906, 3665.6025, 3748.602, 3580.1748, 3679.2065]
2025-09-12 07:54:25,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 619.0, 694.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:54:25,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 14 hours, 12 minutes, 8 seconds)
2025-09-12 08:06:14,024 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:06:14,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:11:11,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3682.17896 ± 333.764
2025-09-12 08:11:11,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3846.7273, 3885.7908, 3693.1938, 3774.22, 2691.7085, 3802.357, 3814.226, 3762.4844, 3769.872, 3781.2102]
2025-09-12 08:11:11,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 749.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:11:11,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (3682.18) for latency ExtremeClogL1U23
2025-09-12 08:11:11,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 13 hours, 43 minutes, 41 seconds)
2025-09-12 08:24:35,264 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:24:35,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:28:41,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2868.61475 ± 717.714
2025-09-12 08:28:41,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3629.0154, 2476.3872, 3740.0881, 2253.3694, 3436.343, 3416.223, 1639.3041, 3538.4624, 2313.6318, 2243.3218]
2025-09-12 08:28:41,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 692.0, 1000.0, 642.0, 1000.0, 1000.0, 488.0, 995.0, 680.0, 647.0]
2025-09-12 08:28:41,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 13 hours, 38 minutes, 9 seconds)
2025-09-12 08:41:31,224 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:41:31,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:46:12,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3436.91797 ± 781.549
2025-09-12 08:46:12,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3768.4285, 3808.861, 3727.8335, 3720.42, 3791.5146, 3654.8938, 3730.5251, 1117.2064, 3385.126, 3664.3699]
2025-09-12 08:46:12,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 319.0, 900.0, 1000.0]
2025-09-12 08:46:12,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 13 hours, 28 minutes, 53 seconds)
2025-09-12 08:58:26,596 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:58:26,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:02:33,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2933.79932 ± 1087.662
2025-09-12 09:02:33,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3787.1472, 3536.305, 3589.4368, 3501.5269, 1100.3204, 3659.603, 1469.2017, 3670.0085, 1276.1461, 3748.2961]
2025-09-12 09:02:33,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 951.0, 1000.0, 346.0, 1000.0, 436.0, 1000.0, 409.0, 1000.0]
2025-09-12 09:02:33,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 12 hours, 58 minutes, 52 seconds)
2025-09-12 09:14:53,138 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:14:53,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:19:46,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3540.39185 ± 430.038
2025-09-12 09:19:46,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3595.0928, 3684.007, 3775.0308, 3647.1418, 3704.2104, 3552.9888, 3716.4656, 3779.9578, 2266.194, 3682.8284]
2025-09-12 09:19:46,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 630.0, 1000.0]
2025-09-12 09:19:46,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 12 hours, 31 minutes, 4 seconds)
2025-09-12 09:32:31,355 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:32:31,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:37:05,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3358.17114 ± 790.831
2025-09-12 09:37:05,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3523.2092, 3705.6863, 3804.603, 1416.7277, 2265.7766, 3856.4019, 3901.0017, 3782.8997, 3529.4067, 3795.997]
2025-09-12 09:37:05,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 414.0, 613.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:37:05,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 12 hours, 18 minutes, 52 seconds)
2025-09-12 09:50:22,529 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:50:22,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:55:08,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3386.74292 ± 723.775
2025-09-12 09:55:08,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3571.2031, 3653.0537, 1235.46, 3511.1064, 3717.1353, 3600.3198, 3795.5605, 3434.8716, 3688.5227, 3660.1936]
2025-09-12 09:55:08,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 373.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:55:08,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 12 hours, 6 minutes, 10 seconds)
2025-09-12 10:08:25,014 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:08:25,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:12:57,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3272.64600 ± 791.049
2025-09-12 10:12:57,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3696.3264, 3721.8843, 3601.9353, 3675.8306, 3658.8792, 1676.9133, 3652.1848, 3672.1533, 3663.9104, 1706.4409]
2025-09-12 10:12:57,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 472.0, 1000.0, 1000.0, 1000.0, 475.0]
2025-09-12 10:12:57,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 11 hours, 51 minutes, 21 seconds)
2025-09-12 10:24:55,802 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:24:55,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:29:27,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3173.43115 ± 886.027
2025-09-12 10:29:27,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3591.4624, 2748.2559, 3559.3184, 3567.3604, 3647.624, 3624.5918, 3244.246, 3571.294, 3548.129, 632.0317]
2025-09-12 10:29:27,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 780.0, 1000.0, 1000.0, 1000.0, 1000.0, 897.0, 1000.0, 1000.0, 256.0]
2025-09-12 10:29:27,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 11 hours, 35 minutes, 12 seconds)
2025-09-12 10:42:49,274 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:42:49,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:47:28,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3408.77344 ± 992.915
2025-09-12 10:47:28,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3656.6833, 3790.781, 3729.8508, 3853.25, 3797.164, 437.57895, 3759.0176, 3680.0789, 3779.9255, 3603.406]
2025-09-12 10:47:28,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 175.0, 992.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:47:28,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 11 hours, 23 minutes, 59 seconds)
2025-09-12 10:59:49,533 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:59:49,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:04:23,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3345.10620 ± 1084.623
2025-09-12 11:04:23,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3672.2375, 3484.0813, 3680.7136, 3786.0166, 103.66037, 3730.149, 3773.7637, 3826.1538, 3618.2048, 3776.0813]
2025-09-12 11:04:23,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [968.0, 963.0, 1000.0, 1000.0, 76.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:04:23,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 11 hours, 3 minutes, 28 seconds)
2025-09-12 11:17:31,699 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:17:31,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:22:11,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3443.80713 ± 767.545
2025-09-12 11:22:11,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1171.42, 3427.632, 3599.495, 3681.0144, 3823.6794, 3792.8555, 3804.8865, 3562.612, 3776.8887, 3797.588]
2025-09-12 11:22:11,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [351.0, 893.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:22:11,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 10 hours, 44 minutes, 6 seconds)
2025-09-12 11:34:41,221 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:34:41,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:38:05,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2514.52783 ± 1517.115
2025-09-12 11:38:05,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [2301.8389, 3870.7083, 3844.2683, 3828.2178, 3906.8354, 2396.4148, 927.31067, 3823.951, 46.97598, 198.7561]
2025-09-12 11:38:05,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [607.0, 1000.0, 1000.0, 1000.0, 1000.0, 616.0, 295.0, 1000.0, 44.0, 112.0]
2025-09-12 11:38:05,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 10 hours, 12 minutes, 56 seconds)
2025-09-12 11:51:15,834 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:51:15,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:55:55,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3447.01221 ± 784.728
2025-09-12 11:55:55,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1169.2007, 3815.6245, 3707.2583, 3710.3127, 3820.367, 3682.6836, 3833.4956, 3726.7825, 3868.229, 3136.1648]
2025-09-12 11:55:55,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [357.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 845.0]
2025-09-12 11:55:55,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 10 hours, 5 minutes, 18 seconds)
2025-09-12 12:07:41,882 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:07:41,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:12:19,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3376.42065 ± 969.311
2025-09-12 12:12:19,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [477.29987, 3655.4663, 3776.6853, 3747.2524, 3696.0715, 3722.8706, 3628.35, 3523.7498, 3787.2607, 3749.2004]
2025-09-12 12:12:19,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [165.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:12:19,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 9 hours, 37 minutes, 3 seconds)
2025-09-12 12:26:12,057 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:26:12,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:30:41,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3327.94067 ± 961.254
2025-09-12 12:30:41,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3775.6875, 3817.1233, 3802.6123, 3826.9082, 3703.1648, 1698.091, 3987.3716, 3822.907, 3689.718, 1155.8207]
2025-09-12 12:30:41,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 481.0, 1000.0, 1000.0, 1000.0, 330.0]
2025-09-12 12:30:41,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 9 hours, 29 minutes, 33 seconds)
2025-09-12 12:42:28,571 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:42:28,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:47:11,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3427.37378 ± 570.582
2025-09-12 12:47:11,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3750.001, 2900.924, 3632.1545, 3570.112, 3717.2527, 3776.0334, 3645.593, 3756.1765, 1877.504, 3647.9885]
2025-09-12 12:47:11,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 791.0, 1000.0, 959.0, 1000.0, 1000.0, 1000.0, 1000.0, 543.0, 1000.0]
2025-09-12 12:47:11,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 9 hours, 4 minutes, 2 seconds)
2025-09-12 13:00:14,430 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:00:14,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:05:08,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3609.22510 ± 389.883
2025-09-12 13:05:08,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3741.7415, 3764.9739, 3722.7603, 3639.1511, 3773.8408, 2451.1165, 3715.3547, 3660.7004, 3828.3447, 3794.2656]
2025-09-12 13:05:08,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 660.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:05:08,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 59 minutes, 44 seconds)
2025-09-12 13:18:10,919 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:18:10,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:22:58,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3531.04932 ± 518.747
2025-09-12 13:22:58,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3765.2522, 3757.2202, 2521.0774, 3846.3564, 2475.328, 3851.264, 3778.5105, 3747.7705, 3862.6086, 3705.1042]
2025-09-12 13:22:58,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 757.0, 1000.0, 689.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:22:58,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 8 hours, 42 minutes, 15 seconds)
2025-09-12 13:35:27,344 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:35:27,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:39:57,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3222.84058 ± 1010.459
2025-09-12 13:39:57,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [240.0073, 3700.602, 3742.6292, 3511.146, 3583.4187, 3662.7988, 3067.3179, 3578.4922, 3656.8364, 3485.1592]
2025-09-12 13:39:57,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [112.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 828.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:39:57,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 8 hours, 28 minutes, 16 seconds)
2025-09-12 13:52:51,003 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:52:51,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:57:34,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3565.55859 ± 488.364
2025-09-12 13:57:34,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3359.1562, 3910.9922, 2955.0134, 3933.187, 3795.1875, 3933.4873, 2394.9316, 3790.4763, 3747.2952, 3835.859]
2025-09-12 13:57:34,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [874.0, 1000.0, 780.0, 1000.0, 1000.0, 1000.0, 624.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:57:34,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 8 hours, 6 minutes, 32 seconds)
2025-09-12 14:10:21,495 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:10:21,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:15:14,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3789.32373 ± 337.000
2025-09-12 14:15:14,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [2799.9465, 3934.9111, 3726.7146, 3901.8562, 3924.3093, 3831.417, 3939.9849, 3962.7524, 3894.2246, 3977.1208]
2025-09-12 14:15:14,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [722.0, 1000.0, 971.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:15:14,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (3789.32) for latency ExtremeClogL1U23
2025-09-12 14:15:14,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 55 minutes, 26 seconds)
2025-09-12 14:27:30,607 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:27:30,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:32:23,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3619.30591 ± 546.704
2025-09-12 14:32:23,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3767.9724, 3776.928, 3761.895, 3812.2056, 1983.5155, 3864.1362, 3732.7766, 3856.3293, 3825.1755, 3812.1245]
2025-09-12 14:32:23,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 558.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:32:23,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 7 hours, 33 minutes, 39 seconds)
2025-09-12 14:44:56,307 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:44:56,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:50:00,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3899.61572 ± 69.287
2025-09-12 14:50:00,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3976.4617, 3888.7207, 3810.1753, 3809.5955, 3967.2183, 3940.022, 3966.0037, 3885.3335, 3959.9167, 3792.7075]
2025-09-12 14:50:00,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:50:00,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (3899.62) for latency ExtremeClogL1U23
2025-09-12 14:50:01,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 7 hours, 15 minutes, 13 seconds)
2025-09-12 15:03:32,936 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:03:32,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:07:35,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3056.79053 ± 1285.594
2025-09-12 15:07:35,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3888.2356, 3853.2751, 277.40033, 2125.2815, 3836.9624, 1197.8344, 3930.0627, 3892.2449, 3902.669, 3663.9387]
2025-09-12 15:07:35,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 98.0, 595.0, 1000.0, 348.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:07:35,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 7 hours, 38 seconds)
2025-09-12 15:20:44,468 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:20:44,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:25:16,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3533.11255 ± 871.896
2025-09-12 15:25:16,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3918.9297, 2366.3467, 4033.095, 3896.7795, 3932.6362, 3970.7427, 3955.9324, 1338.4407, 3950.5894, 3967.6323]
2025-09-12 15:25:16,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 613.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 390.0, 1000.0, 1000.0]
2025-09-12 15:25:16,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 43 minutes, 26 seconds)
2025-09-12 15:38:03,190 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:38:03,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:43:05,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3832.25781 ± 72.844
2025-09-12 15:43:05,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3883.998, 3913.0667, 3833.012, 3820.2642, 3851.0986, 3763.6714, 3908.4583, 3781.479, 3670.708, 3896.8245]
2025-09-12 15:43:05,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:43:05,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 6 hours, 26 minutes, 32 seconds)
2025-09-12 15:54:43,764 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:54:43,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:59:29,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3667.73047 ± 732.898
2025-09-12 15:59:29,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3870.581, 3800.9163, 3886.9548, 3965.9182, 3856.882, 1475.8024, 3931.1626, 4008.5266, 3963.6978, 3916.8623]
2025-09-12 15:59:29,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 416.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:59:29,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 6 hours, 5 minutes, 49 seconds)
2025-09-12 16:12:59,906 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:12:59,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:17:47,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3601.71167 ± 763.664
2025-09-12 16:17:47,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1318.6123, 3839.2166, 3924.5479, 3830.5771, 3893.269, 3878.3516, 3786.6511, 3907.3035, 3713.2988, 3925.2886]
2025-09-12 16:17:47,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [374.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:17:47,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 51 minutes, 4 seconds)
2025-09-12 16:30:28,334 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:30:28,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:34:56,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3458.04639 ± 1033.165
2025-09-12 16:34:56,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [4029.178, 3868.5935, 3994.656, 772.80585, 3993.3992, 4003.99, 3973.1318, 3904.647, 3809.9512, 2230.1096]
2025-09-12 16:34:56,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 236.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 603.0]
2025-09-12 16:34:56,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 31 minutes, 54 seconds)
2025-09-12 16:47:41,916 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:47:41,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:52:45,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3819.89453 ± 58.316
2025-09-12 16:52:45,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3887.0251, 3875.1414, 3812.3926, 3819.512, 3871.5894, 3684.5176, 3854.7349, 3810.2747, 3827.355, 3756.4011]
2025-09-12 16:52:45,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:52:45,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 5 hours, 14 minutes, 54 seconds)
2025-09-12 17:05:31,802 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:05:31,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:10:06,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3431.49927 ± 970.783
2025-09-12 17:10:06,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [613.03937, 3888.7864, 3805.5803, 3884.4517, 3021.8909, 3789.76, 3812.8145, 3861.056, 3828.8315, 3808.7803]
2025-09-12 17:10:06,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [202.0, 1000.0, 1000.0, 1000.0, 806.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:10:06,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 55 minutes, 52 seconds)
2025-09-12 17:23:23,544 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:23:23,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:28:04,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3580.08594 ± 954.402
2025-09-12 17:28:04,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3904.374, 3852.6956, 3928.388, 719.2831, 3832.9773, 3892.7803, 3855.739, 3965.1667, 3936.5295, 3912.924]
2025-09-12 17:28:04,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 240.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:28:04,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 43 minutes, 28 seconds)
2025-09-12 17:40:29,233 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:40:29,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:44:41,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3273.05322 ± 1182.480
2025-09-12 17:44:41,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [585.01306, 4024.8142, 1539.3844, 3969.7136, 4009.8792, 3980.4277, 2763.5535, 3867.9058, 4026.7134, 3963.128]
2025-09-12 17:44:41,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [203.0, 1000.0, 419.0, 1000.0, 1000.0, 1000.0, 703.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:44:41,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 20 minutes, 41 seconds)
2025-09-12 17:57:27,164 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:57:27,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:02:32,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3924.20386 ± 66.629
2025-09-12 18:02:32,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3936.7163, 3900.38, 4026.9343, 3872.5198, 3966.0703, 3907.4946, 3908.9531, 3999.7488, 3947.163, 3776.06]
2025-09-12 18:02:32,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:02:32,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1226 [INFO]: New best (3924.20) for latency ExtremeClogL1U23
2025-09-12 18:02:32,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 4 hours, 5 minutes, 15 seconds)
2025-09-12 18:15:18,832 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:15:18,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:20:18,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3807.02783 ± 58.579
2025-09-12 18:20:18,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3795.4844, 3747.33, 3862.8435, 3661.856, 3833.8865, 3812.0515, 3811.4907, 3866.1536, 3844.28, 3834.9026]
2025-09-12 18:20:18,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:20:18,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 47 minutes, 38 seconds)
2025-09-12 18:32:15,804 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:32:15,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:37:05,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3645.27734 ± 513.594
2025-09-12 18:37:05,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3813.977, 3877.9565, 3488.5508, 3871.3718, 3878.2656, 3882.7766, 3729.8938, 3906.1702, 3857.9844, 2145.8286]
2025-09-12 18:37:05,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 903.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 594.0]
2025-09-12 18:37:05,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 28 minutes, 46 seconds)
2025-09-12 18:49:53,922 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:49:53,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:54:08,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3189.83057 ± 1201.304
2025-09-12 18:54:08,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3809.1316, 3046.1394, 3926.26, 1121.5836, 3870.2314, 3855.8162, 3875.5425, 3930.2808, 3881.8164, 581.50366]
2025-09-12 18:54:08,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 818.0, 1000.0, 344.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 197.0]
2025-09-12 18:54:08,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 9 minutes, 20 seconds)
2025-09-12 19:06:55,017 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:06:55,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:11:24,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3388.50317 ± 1049.198
2025-09-12 19:11:24,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3827.9395, 2120.056, 3943.1917, 670.7697, 3881.8564, 3912.358, 3815.1323, 3940.1523, 3810.526, 3963.048]
2025-09-12 19:11:24,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 580.0, 1000.0, 224.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:11:24,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 53 minutes, 26 seconds)
2025-09-12 19:25:13,611 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:25:13,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:30:15,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3821.57812 ± 78.043
2025-09-12 19:30:15,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3866.2002, 3939.55, 3789.6213, 3692.5837, 3916.569, 3745.3477, 3743.8616, 3873.94, 3867.1199, 3780.9875]
2025-09-12 19:30:15,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:30:15,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 37 minutes, 54 seconds)
2025-09-12 19:41:44,402 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:41:44,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:46:48,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3916.74023 ± 58.737
2025-09-12 19:46:48,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3842.9285, 3798.1099, 3913.0393, 3927.835, 3960.697, 3915.195, 3991.1157, 3953.5667, 3985.0183, 3879.8987]
2025-09-12 19:46:48,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:46:48,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 18 minutes, 22 seconds)
2025-09-12 20:00:26,342 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:00:26,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:04:13,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 2882.97852 ± 1430.670
2025-09-12 20:04:13,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3852.2888, 3944.5657, 3979.4116, 3856.5994, 3826.5686, 3887.0942, 1008.8752, 728.3776, 3314.9202, 431.0831]
2025-09-12 20:04:13,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 298.0, 234.0, 848.0, 134.0]
2025-09-12 20:04:13,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 2 hours, 1 minute, 59 seconds)
2025-09-12 20:16:28,109 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:16:28,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:21:02,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3482.16284 ± 1093.691
2025-09-12 20:21:02,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [204.62265, 3787.5342, 3815.237, 3747.9465, 3860.2393, 3914.7, 3915.9417, 3853.4119, 3834.1357, 3887.8596]
2025-09-12 20:21:02,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [98.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:21:02,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 44 minutes, 17 seconds)
2025-09-12 20:33:55,906 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:33:55,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:38:58,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3896.61670 ± 43.237
2025-09-12 20:38:58,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3823.0793, 3875.4653, 3893.1667, 3919.0757, 3902.9712, 3914.8652, 3977.2197, 3938.3713, 3885.158, 3836.7974]
2025-09-12 20:38:58,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:38:58,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 27 minutes, 34 seconds)
2025-09-12 20:51:43,841 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:51:43,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:55:40,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3048.40967 ± 1461.904
2025-09-12 20:55:40,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3815.7163, 2332.5686, 3884.8245, 4001.512, 3941.4507, 3952.6113, 4013.6602, 257.4348, 314.11182, 3970.2068]
2025-09-12 20:55:40,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 603.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 111.0, 137.0, 1000.0]
2025-09-12 20:55:40,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 8 minutes, 20 seconds)
2025-09-12 21:08:04,161 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:08:04,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:12:46,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3551.25073 ± 896.610
2025-09-12 21:12:46,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3737.2542, 3916.6191, 3942.246, 869.1017, 3899.0586, 3832.4568, 3849.548, 3764.2002, 3779.8752, 3922.1477]
2025-09-12 21:12:46,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 276.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:12:46,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 51 minutes, 34 seconds)
2025-09-12 21:26:01,886 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:26:01,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:30:32,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3429.05273 ± 905.506
2025-09-12 21:30:32,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3889.918, 3812.2283, 3902.8245, 3816.612, 2070.166, 3953.6907, 3884.627, 3949.2607, 1248.6157, 3762.5842]
2025-09-12 21:30:32,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 588.0, 1000.0, 1000.0, 1000.0, 356.0, 1000.0]
2025-09-12 21:30:32,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 34 minutes, 31 seconds)
2025-09-12 21:43:03,579 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:43:03,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:47:46,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3693.51050 ± 497.885
2025-09-12 21:47:46,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [3872.2446, 3957.266, 3881.2195, 3968.6729, 3123.813, 3988.157, 2399.8372, 3762.3606, 3992.4153, 3989.1228]
2025-09-12 21:47:46,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 781.0, 1000.0, 683.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:47:46,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 17 minutes, 20 seconds)
2025-09-12 22:00:12,150 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:00:12,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 22:04:51,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 3688.94067 ± 988.150
2025-09-12 22:04:51,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [4013.35, 4045.4336, 732.4745, 3884.754, 3938.4707, 3963.1824, 4031.978, 4022.808, 4133.066, 4123.8877]
2025-09-12 22:04:51,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 228.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:04:51,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc5-walker2d):1251 [DEBUG]: Training session finished
