2025-09-11 19:44:19,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc20-walker2d/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 19:44:19,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc20-walker2d/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 19:44:19,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x146495db9850>}
2025-09-11 19:44:19,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1111 [DEBUG]: using device: cuda
2025-09-11 19:44:19,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1133 [INFO]: Creating new trainer
2025-09-11 19:44:19,649 baseline-mbpac-noiseperc20-walker2d:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-11 19:44:19,649 baseline-mbpac-noiseperc20-walker2d:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 19:44:19,658 baseline-mbpac-noiseperc20-walker2d:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-11 19:44:20,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1194 [DEBUG]: Starting training session...
2025-09-11 19:44:20,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 1/100
2025-09-11 19:54:10,301 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:54:10,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:54:49,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 140.83388 ± 125.239
2025-09-11 19:54:49,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [329.27927, 3.567822, 192.69383, 29.044195, 166.3611, 52.97901, 62.89045, 115.68245, 396.8331, 59.007603]
2025-09-11 19:54:49,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [199.0, 14.0, 121.0, 58.0, 83.0, 162.0, 170.0, 236.0, 269.0, 171.0]
2025-09-11 19:54:49,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (140.83) for latency ExtremeClogL1U23
2025-09-11 19:54:49,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 17 hours, 18 minutes, 16 seconds)
2025-09-11 20:06:05,343 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:06:05,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:06:43,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 147.02664 ± 189.534
2025-09-11 20:06:43,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [2.8704145, 102.64267, 1.6727959, 0.9561673, 262.3218, 7.698816, 202.79109, 645.7815, 57.854027, 185.67711]
2025-09-11 20:06:43,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 207.0, 18.0, 11.0, 148.0, 20.0, 135.0, 367.0, 243.0, 253.0]
2025-09-11 20:06:43,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (147.03) for latency ExtremeClogL1U23
2025-09-11 20:06:43,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 18 hours, 16 minutes, 48 seconds)
2025-09-11 20:18:00,216 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:18:00,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:18:31,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 65.68541 ± 115.581
2025-09-11 20:18:31,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [60.988377, -24.352892, 29.438936, 404.0376, 54.19297, 38.97479, 3.3265862, 34.3438, 4.4876413, 51.41631]
2025-09-11 20:18:31,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [134.0, 131.0, 62.0, 336.0, 202.0, 101.0, 18.0, 80.0, 16.0, 104.0]
2025-09-11 20:18:31,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 18 hours, 25 minutes, 22 seconds)
2025-09-11 20:29:55,137 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:29:55,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:30:56,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 240.02966 ± 181.189
2025-09-11 20:30:56,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [84.13185, 64.72307, 596.34863, 206.41006, 4.41833, 266.34723, 81.40146, 444.23987, 265.8709, 386.40533]
2025-09-11 20:30:56,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [223.0, 88.0, 514.0, 354.0, 20.0, 184.0, 205.0, 297.0, 134.0, 258.0]
2025-09-11 20:30:56,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (240.03) for latency ExtremeClogL1U23
2025-09-11 20:30:56,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 18 hours, 38 minutes, 10 seconds)
2025-09-11 20:42:03,499 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:42:03,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:42:58,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 292.45718 ± 123.923
2025-09-11 20:42:58,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [405.3415, 273.85953, 90.06136, 264.96207, 188.59859, 300.33252, 481.8427, 345.50552, 443.83337, 130.2343]
2025-09-11 20:42:58,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [238.0, 167.0, 270.0, 141.0, 129.0, 135.0, 300.0, 208.0, 302.0, 136.0]
2025-09-11 20:42:58,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (292.46) for latency ExtremeClogL1U23
2025-09-11 20:42:58,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 18 hours, 33 minutes, 56 seconds)
2025-09-11 20:54:09,691 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:54:09,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:55:21,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 312.42801 ± 128.456
2025-09-11 20:55:21,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [177.87906, 440.23587, 314.86508, 82.45262, 407.18262, 344.3774, 362.95975, 538.0639, 228.51952, 227.7442]
2025-09-11 20:55:21,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [375.0, 300.0, 215.0, 155.0, 321.0, 207.0, 194.0, 374.0, 137.0, 390.0]
2025-09-11 20:55:21,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (312.43) for latency ExtremeClogL1U23
2025-09-11 20:55:21,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 18 hours, 57 minutes, 50 seconds)
2025-09-11 21:06:30,628 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:06:30,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:07:13,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 222.60059 ± 125.733
2025-09-11 21:07:13,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [351.3611, 41.972683, 223.246, 290.4204, 90.55755, 245.28609, 245.66376, 39.152542, 449.43674, 248.90881]
2025-09-11 21:07:13,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [158.0, 47.0, 149.0, 154.0, 103.0, 138.0, 141.0, 87.0, 484.0, 171.0]
2025-09-11 21:07:13,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 18 hours, 45 minutes, 23 seconds)
2025-09-11 21:18:22,994 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:18:22,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:19:10,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 335.76401 ± 126.116
2025-09-11 21:19:10,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [314.54904, 330.864, 318.34723, 333.7326, 343.76303, 288.79324, 403.56747, 459.78656, 26.907751, 537.3291]
2025-09-11 21:19:10,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [171.0, 178.0, 168.0, 229.0, 195.0, 157.0, 232.0, 206.0, 46.0, 249.0]
2025-09-11 21:19:10,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (335.76) for latency ExtremeClogL1U23
2025-09-11 21:19:10,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 18 hours, 35 minutes, 56 seconds)
2025-09-11 21:30:07,685 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:30:07,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:31:13,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 385.22952 ± 123.411
2025-09-11 21:31:13,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [357.9661, 245.51703, 249.8803, 516.3997, 490.58887, 434.2981, 235.79044, 615.7765, 405.2378, 300.84042]
2025-09-11 21:31:13,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [231.0, 175.0, 188.0, 386.0, 261.0, 326.0, 114.0, 390.0, 264.0, 190.0]
2025-09-11 21:31:13,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (385.23) for latency ExtremeClogL1U23
2025-09-11 21:31:13,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 18 hours, 17 minutes, 15 seconds)
2025-09-11 21:42:06,475 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:42:06,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:42:56,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 384.32489 ± 165.821
2025-09-11 21:42:56,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [333.06693, 292.1972, 422.8283, 343.94073, 394.6658, 610.20465, 342.91785, -1.6994185, 586.2706, 518.85645]
2025-09-11 21:42:56,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [202.0, 128.0, 215.0, 150.0, 188.0, 340.0, 191.0, 16.0, 267.0, 242.0]
2025-09-11 21:42:56,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 17 hours, 59 minutes, 35 seconds)
2025-09-11 21:54:02,356 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:54:02,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:54:55,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 360.74927 ± 133.557
2025-09-11 21:54:55,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [310.4059, 459.78802, 385.92575, 367.65805, 492.87018, 433.4142, 5.618674, 467.64383, 388.84683, 295.32095]
2025-09-11 21:54:55,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [147.0, 362.0, 240.0, 182.0, 215.0, 301.0, 22.0, 220.0, 208.0, 182.0]
2025-09-11 21:54:55,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 17 hours, 40 minutes, 21 seconds)
2025-09-11 22:06:00,175 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:06:00,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:07:00,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 361.83701 ± 201.731
2025-09-11 22:07:00,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [577.2287, 287.51303, 540.66956, 395.3382, 349.67532, 76.23514, 288.70178, -1.8038208, 429.93057, 674.8817]
2025-09-11 22:07:00,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [338.0, 173.0, 207.0, 235.0, 199.0, 122.0, 406.0, 22.0, 269.0, 318.0]
2025-09-11 22:07:00,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 17 hours, 32 minutes)
2025-09-11 22:17:57,809 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:17:57,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:18:27,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 171.74268 ± 154.266
2025-09-11 22:18:27,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [371.3845, 125.77495, 4.784285, 21.931425, 7.9332185, 371.95444, 114.27972, 302.19717, 367.43066, 29.756323]
2025-09-11 22:18:27,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [164.0, 124.0, 14.0, 33.0, 21.0, 213.0, 175.0, 215.0, 166.0, 40.0]
2025-09-11 22:18:27,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 17 hours, 11 minutes, 35 seconds)
2025-09-11 22:29:31,349 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:29:31,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:30:38,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 399.99533 ± 177.071
2025-09-11 22:30:38,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [326.30066, 443.42484, 498.6147, 417.8268, 7.3579564, 709.4302, 572.7779, 297.2192, 403.405, 323.59616]
2025-09-11 22:30:38,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [203.0, 291.0, 284.0, 255.0, 24.0, 593.0, 326.0, 158.0, 278.0, 166.0]
2025-09-11 22:30:38,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (400.00) for latency ExtremeClogL1U23
2025-09-11 22:30:38,276 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 17 hours, 1 minute, 56 seconds)
2025-09-11 22:41:43,733 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:41:43,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:42:45,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 405.19608 ± 75.729
2025-09-11 22:42:45,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [383.54346, 399.8715, 291.08856, 462.32126, 514.8665, 417.247, 383.42044, 400.68155, 517.4328, 281.48755]
2025-09-11 22:42:45,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [195.0, 254.0, 170.0, 295.0, 263.0, 182.0, 244.0, 210.0, 327.0, 187.0]
2025-09-11 22:42:45,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (405.20) for latency ExtremeClogL1U23
2025-09-11 22:42:45,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 16 hours, 56 minutes, 43 seconds)
2025-09-11 22:54:12,962 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:54:12,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:54:51,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 266.97757 ± 176.135
2025-09-11 22:54:51,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [306.49664, 27.070316, 330.7284, 7.7708416, 5.3497357, 479.00116, 361.84982, 453.26566, 281.06342, 417.1796]
2025-09-11 22:54:51,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [154.0, 40.0, 183.0, 20.0, 19.0, 250.0, 170.0, 190.0, 166.0, 277.0]
2025-09-11 22:54:51,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 16 hours, 46 minutes, 58 seconds)
2025-09-11 23:06:07,066 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:06:07,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:06:58,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 326.72247 ± 227.857
2025-09-11 23:06:58,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [232.40166, 358.3984, 840.89087, 257.9274, 522.8406, 363.81903, 50.02251, 414.18683, 1.3482445, 225.3891]
2025-09-11 23:06:58,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [125.0, 215.0, 589.0, 161.0, 209.0, 181.0, 57.0, 217.0, 16.0, 156.0]
2025-09-11 23:06:58,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 16 hours, 35 minutes, 30 seconds)
2025-09-11 23:18:13,263 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:18:13,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:19:02,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 349.42615 ± 57.356
2025-09-11 23:19:02,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [278.8865, 437.39038, 386.71765, 259.45404, 353.14157, 343.0677, 439.78827, 352.17813, 300.06293, 343.57437]
2025-09-11 23:19:02,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [143.0, 234.0, 197.0, 136.0, 186.0, 220.0, 243.0, 185.0, 146.0, 186.0]
2025-09-11 23:19:02,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 16 hours, 33 minutes, 26 seconds)
2025-09-11 23:30:19,009 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:30:19,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:31:16,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 406.82217 ± 126.091
2025-09-11 23:31:16,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [466.00528, 429.4761, 378.41852, 569.0047, 474.47864, 259.65463, 184.5926, 616.6468, 328.17273, 361.7716]
2025-09-11 23:31:16,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [197.0, 233.0, 226.0, 329.0, 256.0, 152.0, 92.0, 335.0, 150.0, 185.0]
2025-09-11 23:31:16,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (406.82) for latency ExtremeClogL1U23
2025-09-11 23:31:16,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 16 hours, 22 minutes, 15 seconds)
2025-09-11 23:42:40,992 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:42:40,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:43:19,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 303.00604 ± 87.389
2025-09-11 23:43:19,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [259.93146, 203.2539, 227.4983, 220.20735, 416.66428, 331.75336, 476.42313, 355.26175, 309.6371, 229.42964]
2025-09-11 23:43:19,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [118.0, 108.0, 147.0, 107.0, 161.0, 179.0, 180.0, 186.0, 164.0, 96.0]
2025-09-11 23:43:19,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 16 hours, 9 minutes)
2025-09-11 23:54:41,705 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:54:41,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:55:22,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 287.46936 ± 119.543
2025-09-11 23:55:22,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [275.37454, 477.86984, 216.99007, 375.1821, 224.7691, 278.6638, 291.90152, 372.17484, 353.97675, 7.790972]
2025-09-11 23:55:22,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [145.0, 256.0, 143.0, 149.0, 121.0, 155.0, 155.0, 205.0, 192.0, 21.0]
2025-09-11 23:55:22,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 15 hours, 56 minutes, 4 seconds)
2025-09-12 00:06:36,562 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:06:36,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:07:16,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 269.72040 ± 104.528
2025-09-12 00:07:16,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [282.24078, 350.45078, 236.08426, 460.73187, 314.3894, 366.8237, 188.90115, 152.50009, 257.3487, 87.73328]
2025-09-12 00:07:16,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [145.0, 144.0, 129.0, 287.0, 159.0, 216.0, 113.0, 66.0, 184.0, 87.0]
2025-09-12 00:07:16,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 15 hours, 40 minutes, 47 seconds)
2025-09-12 00:18:42,289 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:18:42,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:19:26,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 305.21863 ± 120.894
2025-09-12 00:19:26,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [410.15115, 338.12952, 229.62549, 433.6476, 28.0505, 236.4626, 469.15146, 348.38754, 268.78464, 289.79562]
2025-09-12 00:19:26,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [204.0, 136.0, 99.0, 213.0, 34.0, 103.0, 347.0, 264.0, 127.0, 148.0]
2025-09-12 00:19:26,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 15 hours, 30 minutes, 15 seconds)
2025-09-12 00:30:55,538 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:30:55,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:31:38,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 382.03665 ± 102.126
2025-09-12 00:31:38,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [273.78165, 518.52814, 341.1674, 185.86798, 447.22214, 495.82623, 428.88907, 368.05713, 296.24484, 464.78174]
2025-09-12 00:31:38,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [135.0, 184.0, 159.0, 95.0, 165.0, 224.0, 172.0, 157.0, 122.0, 214.0]
2025-09-12 00:31:38,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 15 hours, 17 minutes, 33 seconds)
2025-09-12 00:42:58,170 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:42:58,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:44:07,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 528.85925 ± 261.598
2025-09-12 00:44:07,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [171.11963, 464.93674, 992.64056, 867.4076, 234.44887, 387.13968, 534.74774, 347.13208, 811.8266, 477.19278]
2025-09-12 00:44:07,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [103.0, 169.0, 389.0, 393.0, 125.0, 218.0, 249.0, 232.0, 528.0, 188.0]
2025-09-12 00:44:07,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (528.86) for latency ExtremeClogL1U23
2025-09-12 00:44:07,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 15 hours, 12 minutes, 6 seconds)
2025-09-12 00:55:29,779 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:55:29,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:56:05,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 277.17242 ± 74.590
2025-09-12 00:56:05,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [363.60495, 199.21857, 264.45245, 209.61661, 310.5508, 176.21019, 195.04047, 325.93454, 395.25974, 331.83618]
2025-09-12 00:56:05,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [184.0, 99.0, 113.0, 112.0, 125.0, 90.0, 108.0, 139.0, 177.0, 196.0]
2025-09-12 00:56:05,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 14 hours, 58 minutes, 40 seconds)
2025-09-12 01:07:19,172 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:07:19,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:08:05,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 411.48785 ± 232.730
2025-09-12 01:08:05,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [289.01324, 470.3548, 254.25757, 320.77045, 227.7947, 288.00858, 329.61325, 326.21542, 1047.7875, 561.063]
2025-09-12 01:08:05,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [106.0, 182.0, 135.0, 142.0, 114.0, 111.0, 132.0, 151.0, 433.0, 243.0]
2025-09-12 01:08:05,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 14 hours, 47 minutes, 46 seconds)
2025-09-12 01:19:23,339 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:19:23,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:20:17,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 480.10834 ± 313.638
2025-09-12 01:20:17,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [259.2799, 496.5225, 262.601, 598.12445, 1089.4469, 229.47824, 457.51837, 460.68842, 5.557535, 941.8663]
2025-09-12 01:20:17,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [129.0, 189.0, 131.0, 233.0, 412.0, 108.0, 181.0, 241.0, 16.0, 400.0]
2025-09-12 01:20:17,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 14 hours, 36 minutes, 15 seconds)
2025-09-12 01:31:33,769 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:31:33,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:32:18,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 413.92706 ± 217.976
2025-09-12 01:32:18,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [317.77655, 273.08472, 324.37393, 193.62263, 788.1204, 208.50703, 779.251, 338.03586, 279.17816, 637.3203]
2025-09-12 01:32:18,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [179.0, 114.0, 141.0, 91.0, 277.0, 101.0, 258.0, 140.0, 112.0, 279.0]
2025-09-12 01:32:18,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 14 hours, 21 minutes, 29 seconds)
2025-09-12 01:43:20,139 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:43:20,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:44:01,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 400.92535 ± 346.057
2025-09-12 01:44:01,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [730.68567, 834.62915, 15.332244, 888.9816, 244.69902, 6.247134, 3.1117313, 767.2493, 281.61887, 236.6984]
2025-09-12 01:44:01,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [251.0, 276.0, 27.0, 303.0, 111.0, 19.0, 15.0, 326.0, 136.0, 116.0]
2025-09-12 01:44:01,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 13 hours, 58 minutes, 33 seconds)
2025-09-12 01:55:13,420 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:55:13,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:55:56,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 440.49130 ± 261.932
2025-09-12 01:55:56,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [786.7585, 682.01514, 302.07266, 840.9529, 7.3550177, 232.5063, 499.45142, 566.6338, 227.42574, 259.7418]
2025-09-12 01:55:56,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [254.0, 236.0, 129.0, 289.0, 19.0, 119.0, 174.0, 204.0, 102.0, 114.0]
2025-09-12 01:55:56,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 13 hours, 45 minutes, 48 seconds)
2025-09-12 02:06:50,017 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:06:50,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:07:32,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 365.79330 ± 231.507
2025-09-12 02:07:32,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [316.3676, 376.49878, 518.5816, 280.26852, 532.37897, 9.116124, 464.69833, 332.13422, 821.5097, 6.3791847]
2025-09-12 02:07:32,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [127.0, 151.0, 218.0, 116.0, 203.0, 18.0, 281.0, 168.0, 335.0, 18.0]
2025-09-12 02:07:32,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 13 hours, 28 minutes, 34 seconds)
2025-09-12 02:18:38,554 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:18:38,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:19:29,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 484.95746 ± 361.718
2025-09-12 02:19:29,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [812.8759, 1.0790218, 3.6640594, 1120.3768, 854.2425, 565.66547, 255.58887, 280.07285, 694.52527, 261.48395]
2025-09-12 02:19:29,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [310.0, 20.0, 15.0, 485.0, 328.0, 208.0, 121.0, 126.0, 223.0, 113.0]
2025-09-12 02:19:29,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 13 hours, 13 minutes, 4 seconds)
2025-09-12 02:30:32,862 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:30:32,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:31:12,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 386.65051 ± 110.997
2025-09-12 02:31:12,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [436.77493, 440.2572, 349.13754, 247.86832, 474.26303, 540.68774, 540.2203, 348.9164, 252.78903, 235.59067]
2025-09-12 02:31:12,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [189.0, 157.0, 160.0, 106.0, 173.0, 201.0, 181.0, 138.0, 110.0, 110.0]
2025-09-12 02:31:12,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 12 hours, 57 minutes, 34 seconds)
2025-09-12 02:42:17,619 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:42:17,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:42:52,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 306.46252 ± 280.038
2025-09-12 02:42:52,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3.3514404, 4.6555924, 905.59875, 575.27795, 221.80537, 348.5264, -1.3879391, 539.746, 258.59164, 208.46005]
2025-09-12 02:42:52,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 17.0, 340.0, 248.0, 128.0, 168.0, 15.0, 203.0, 113.0, 101.0]
2025-09-12 02:42:52,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 12 hours, 45 minutes, 10 seconds)
2025-09-12 02:53:54,437 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:53:54,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:54:31,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 344.42615 ± 107.476
2025-09-12 02:54:31,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [252.88911, 628.50793, 298.52118, 282.74142, 252.42621, 374.30063, 315.69568, 277.75818, 421.0695, 340.35153]
2025-09-12 02:54:31,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [113.0, 243.0, 123.0, 116.0, 106.0, 151.0, 122.0, 125.0, 157.0, 140.0]
2025-09-12 02:54:31,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 12 hours, 29 minutes, 50 seconds)
2025-09-12 03:05:44,109 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:05:44,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:06:16,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 280.19522 ± 293.371
2025-09-12 03:06:16,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [277.13153, 212.87514, 820.5015, 395.25046, 3.8297713, 3.329549, 6.3098474, 10.767972, 780.3864, 291.5702]
2025-09-12 03:06:16,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [128.0, 102.0, 313.0, 154.0, 18.0, 16.0, 20.0, 23.0, 358.0, 137.0]
2025-09-12 03:06:16,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 12 hours, 20 minutes, 3 seconds)
2025-09-12 03:17:10,927 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:17:10,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:18:07,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 547.98975 ± 337.988
2025-09-12 03:18:07,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [547.1907, 263.9728, 694.2021, 284.4235, 49.605427, 383.934, 833.4956, 956.54895, 1168.27, 298.25427]
2025-09-12 03:18:07,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [237.0, 111.0, 251.0, 131.0, 50.0, 164.0, 271.0, 363.0, 469.0, 127.0]
2025-09-12 03:18:07,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (547.99) for latency ExtremeClogL1U23
2025-09-12 03:18:07,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 12 hours, 7 minutes, 8 seconds)
2025-09-12 03:29:22,085 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:29:22,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:30:15,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 541.25995 ± 153.613
2025-09-12 03:30:15,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [619.82135, 663.0199, 386.7348, 536.68207, 529.4274, 359.89902, 824.8084, 632.89514, 280.7692, 578.54236]
2025-09-12 03:30:15,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [238.0, 230.0, 145.0, 208.0, 189.0, 135.0, 327.0, 230.0, 126.0, 215.0]
2025-09-12 03:30:15,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 12 hours, 22 seconds)
2025-09-12 03:41:16,717 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:41:16,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:42:14,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 571.01093 ± 296.753
2025-09-12 03:42:14,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [207.84302, 541.17883, 637.2539, 411.53793, 522.3189, 861.45465, 838.1412, 1053.0145, 635.12286, 2.2429588]
2025-09-12 03:42:14,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [99.0, 210.0, 206.0, 169.0, 210.0, 298.0, 366.0, 369.0, 304.0, 22.0]
2025-09-12 03:42:14,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (571.01) for latency ExtremeClogL1U23
2025-09-12 03:42:14,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 11 hours, 52 minutes, 20 seconds)
2025-09-12 03:53:16,479 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:53:16,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:54:22,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 694.52625 ± 241.107
2025-09-12 03:54:22,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [906.2867, 643.07, 808.98895, 831.0938, 460.21402, 336.94394, 988.99646, 451.76535, 1046.5026, 471.40082]
2025-09-12 03:54:22,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [313.0, 227.0, 273.0, 286.0, 178.0, 128.0, 326.0, 164.0, 457.0, 168.0]
2025-09-12 03:54:22,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (694.53) for latency ExtremeClogL1U23
2025-09-12 03:54:22,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 11 hours, 46 minutes, 14 seconds)
2025-09-12 04:05:18,268 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:05:18,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:06:27,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 685.17761 ± 329.075
2025-09-12 04:06:27,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [479.99063, 407.90656, 1038.9735, 502.1251, 840.3171, 470.9935, 236.12766, 609.1951, 1384.7277, 881.41943]
2025-09-12 04:06:27,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [172.0, 171.0, 432.0, 192.0, 300.0, 225.0, 96.0, 211.0, 520.0, 336.0]
2025-09-12 04:06:27,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 11 hours, 38 minutes, 6 seconds)
2025-09-12 04:17:54,089 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:17:54,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:19:17,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 847.38904 ± 400.536
2025-09-12 04:19:17,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [650.09845, 1801.6414, 497.46008, 489.42413, 1073.2386, 1044.0759, 447.40643, 1120.9736, 774.9643, 574.60785]
2025-09-12 04:19:17,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [240.0, 719.0, 171.0, 183.0, 377.0, 413.0, 166.0, 411.0, 292.0, 228.0]
2025-09-12 04:19:17,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (847.39) for latency ExtremeClogL1U23
2025-09-12 04:19:17,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 11 hours, 37 minutes, 19 seconds)
2025-09-12 04:29:57,955 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:29:57,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:30:57,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 650.76215 ± 240.058
2025-09-12 04:30:57,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [657.22296, 310.86514, 525.37585, 507.367, 902.93646, 459.47568, 1203.7698, 541.76556, 729.47546, 669.3675]
2025-09-12 04:30:57,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [222.0, 149.0, 169.0, 171.0, 314.0, 171.0, 422.0, 178.0, 267.0, 250.0]
2025-09-12 04:30:57,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 11 hours, 19 minutes, 49 seconds)
2025-09-12 04:41:58,163 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:41:58,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:43:01,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 735.52405 ± 262.921
2025-09-12 04:43:01,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [956.04517, 872.8436, 7.3049126, 816.653, 853.8173, 614.38824, 775.172, 935.3435, 849.4512, 674.2211]
2025-09-12 04:43:01,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [302.0, 305.0, 19.0, 258.0, 267.0, 204.0, 281.0, 307.0, 284.0, 216.0]
2025-09-12 04:43:01,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 11 hours, 8 minutes, 38 seconds)
2025-09-12 04:54:07,857 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:54:07,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:55:12,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 707.05823 ± 413.912
2025-09-12 04:55:12,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [200.0801, 712.99854, 495.56015, 461.2872, 1756.9479, 846.7959, 739.768, 271.85492, 902.4508, 682.8387]
2025-09-12 04:55:12,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [93.0, 229.0, 169.0, 162.0, 598.0, 260.0, 270.0, 153.0, 328.0, 241.0]
2025-09-12 04:55:12,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 10 hours, 56 minutes, 59 seconds)
2025-09-12 05:06:17,773 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:06:17,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:07:33,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 841.46204 ± 231.949
2025-09-12 05:07:33,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [691.4799, 717.2853, 909.07306, 541.49615, 1070.4094, 1395.6085, 742.9111, 681.43085, 905.1145, 759.8114]
2025-09-12 05:07:33,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [222.0, 239.0, 307.0, 178.0, 337.0, 579.0, 258.0, 231.0, 299.0, 250.0]
2025-09-12 05:07:33,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 10 hours, 47 minutes, 37 seconds)
2025-09-12 05:18:27,823 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:18:27,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:19:40,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 802.45343 ± 200.582
2025-09-12 05:19:40,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [925.1002, 757.5439, 769.2634, 1058.0487, 515.15045, 1148.1835, 816.8617, 763.0931, 807.9254, 463.36395]
2025-09-12 05:19:40,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [320.0, 258.0, 300.0, 363.0, 171.0, 412.0, 284.0, 243.0, 283.0, 154.0]
2025-09-12 05:19:40,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 10 hours, 27 minutes, 58 seconds)
2025-09-12 05:30:47,200 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:30:47,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:31:59,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 855.60107 ± 255.191
2025-09-12 05:31:59,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1075.305, 1532.2029, 778.835, 815.67645, 744.41003, 826.6345, 724.1918, 572.6296, 713.03845, 773.087]
2025-09-12 05:31:59,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [362.0, 480.0, 244.0, 247.0, 247.0, 273.0, 250.0, 190.0, 241.0, 243.0]
2025-09-12 05:31:59,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (855.60) for latency ExtremeClogL1U23
2025-09-12 05:31:59,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 10 hours, 22 minutes, 32 seconds)
2025-09-12 05:43:02,689 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:43:02,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:44:05,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 707.31299 ± 222.007
2025-09-12 05:44:05,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [807.1992, 609.7397, 1203.0176, 552.63617, 570.9917, 917.4768, 588.87915, 472.84375, 488.21768, 862.12775]
2025-09-12 05:44:05,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [258.0, 223.0, 447.0, 184.0, 190.0, 293.0, 204.0, 173.0, 178.0, 271.0]
2025-09-12 05:44:05,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 10 hours, 10 minutes, 39 seconds)
2025-09-12 05:55:21,393 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:55:21,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:56:36,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 853.40186 ± 142.802
2025-09-12 05:56:36,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [817.9626, 723.74536, 1021.13184, 1032.1573, 804.9609, 869.0605, 788.6648, 1045.1223, 864.362, 566.85065]
2025-09-12 05:56:36,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [285.0, 224.0, 356.0, 337.0, 276.0, 303.0, 281.0, 348.0, 277.0, 186.0]
2025-09-12 05:56:36,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 10 hours, 1 minute, 50 seconds)
2025-09-12 06:07:37,890 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:07:37,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:08:50,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 835.41132 ± 265.662
2025-09-12 06:08:50,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [628.2581, 824.8, 1299.6665, 807.91644, 757.1359, 810.52026, 968.1749, 1265.968, 440.71686, 550.95557]
2025-09-12 06:08:50,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [207.0, 254.0, 460.0, 273.0, 235.0, 282.0, 332.0, 409.0, 162.0, 185.0]
2025-09-12 06:08:50,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 9 hours, 48 minutes, 16 seconds)
2025-09-12 06:19:54,681 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:19:54,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:21:24,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1011.46796 ± 557.545
2025-09-12 06:21:24,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [553.624, 2434.4644, 1545.7599, 1087.1581, 559.18726, 853.43604, 1004.97217, 518.99097, 811.84314, 745.2446]
2025-09-12 06:21:24,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [178.0, 930.0, 497.0, 336.0, 193.0, 273.0, 320.0, 189.0, 282.0, 237.0]
2025-09-12 06:21:24,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (1011.47) for latency ExtremeClogL1U23
2025-09-12 06:21:24,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 9 hours, 40 minutes, 20 seconds)
2025-09-12 06:32:47,384 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:32:47,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:33:40,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 563.61066 ± 360.310
2025-09-12 06:33:40,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [957.3167, -1.842506, 806.81525, 651.03784, 810.1485, 992.62024, 164.06477, 484.31787, 761.7932, 9.834819]
2025-09-12 06:33:40,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [292.0, 23.0, 373.0, 217.0, 270.0, 318.0, 128.0, 162.0, 243.0, 34.0]
2025-09-12 06:33:40,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 9 hours, 27 minutes, 29 seconds)
2025-09-12 06:44:21,156 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:44:21,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:46:00,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1179.54272 ± 297.536
2025-09-12 06:46:00,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1470.9133, 1763.0929, 1000.336, 1150.5537, 1314.5317, 1427.6265, 753.473, 837.2678, 972.37354, 1105.2588]
2025-09-12 06:46:00,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [466.0, 560.0, 331.0, 356.0, 434.0, 487.0, 234.0, 260.0, 303.0, 377.0]
2025-09-12 06:46:00,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (1179.54) for latency ExtremeClogL1U23
2025-09-12 06:46:00,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 9 hours, 17 minutes, 14 seconds)
2025-09-12 06:57:11,875 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:57:11,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:58:20,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 782.32062 ± 394.803
2025-09-12 06:58:20,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [2.592852, 710.3725, 1539.0933, 743.7633, 830.93494, 1260.7656, 694.3364, 630.8652, 484.42664, 926.05493]
2025-09-12 06:58:20,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 264.0, 513.0, 231.0, 254.0, 450.0, 248.0, 203.0, 163.0, 272.0]
2025-09-12 06:58:20,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 9 hours, 3 minutes, 11 seconds)
2025-09-12 07:09:30,483 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:09:30,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:11:03,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1092.43408 ± 362.206
2025-09-12 07:11:03,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1788.3375, 1002.80194, 999.68524, 872.0903, 1289.3445, 888.34186, 561.7275, 789.41656, 1647.8657, 1084.7292]
2025-09-12 07:11:03,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [582.0, 324.0, 311.0, 272.0, 489.0, 287.0, 188.0, 238.0, 523.0, 367.0]
2025-09-12 07:11:03,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 8 hours, 55 minutes, 5 seconds)
2025-09-12 07:22:17,987 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:22:17,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:23:31,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 876.23816 ± 336.623
2025-09-12 07:23:31,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1053.4321, 732.056, 229.68007, 1008.44666, 1010.8674, 791.95465, 1197.4309, 1486.6997, 528.2801, 723.5346]
2025-09-12 07:23:31,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [337.0, 262.0, 111.0, 304.0, 340.0, 258.0, 399.0, 429.0, 177.0, 263.0]
2025-09-12 07:23:31,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 8 hours, 41 minutes, 47 seconds)
2025-09-12 07:34:22,536 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:34:22,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:36:08,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1250.96948 ± 887.546
2025-09-12 07:36:08,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [983.4925, 6.410158, 7.524027, 2251.9568, 1614.0275, 1178.4124, 1097.1797, 831.1538, 3074.7224, 1464.8167]
2025-09-12 07:36:08,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [323.0, 24.0, 20.0, 755.0, 497.0, 373.0, 430.0, 297.0, 1000.0, 450.0]
2025-09-12 07:36:08,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (1250.97) for latency ExtremeClogL1U23
2025-09-12 07:36:08,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 8 hours, 32 minutes, 10 seconds)
2025-09-12 07:47:03,326 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:47:03,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:49:02,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1447.24255 ± 638.245
2025-09-12 07:49:02,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1374.6139, 1094.9746, 634.7281, 1290.0527, 1142.2085, 2895.0466, 1108.1832, 2184.9646, 926.52515, 1821.1279]
2025-09-12 07:49:02,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [447.0, 374.0, 221.0, 433.0, 373.0, 1000.0, 368.0, 678.0, 312.0, 592.0]
2025-09-12 07:49:02,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (1447.24) for latency ExtremeClogL1U23
2025-09-12 07:49:02,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 8 hours, 24 minutes, 14 seconds)
2025-09-12 07:59:20,978 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:59:20,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:01:18,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1442.93469 ± 793.626
2025-09-12 08:01:18,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [551.56195, 867.03, 1238.8025, 2309.9128, 3054.4844, 1972.5322, 625.0138, 903.72754, 932.8625, 1973.4187]
2025-09-12 08:01:18,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [189.0, 303.0, 390.0, 724.0, 981.0, 582.0, 222.0, 272.0, 292.0, 588.0]
2025-09-12 08:01:18,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 8 hours, 11 minutes, 10 seconds)
2025-09-12 08:14:27,839 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:14:27,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:15:33,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 677.66699 ± 756.842
2025-09-12 08:15:33,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1978.9663, 1672.0654, 1557.0275, 779.0053, 10.789396, 11.28503, 1.2770538, 6.0529156, 0.6990361, 759.5018]
2025-09-12 08:15:33,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [678.0, 636.0, 463.0, 270.0, 23.0, 22.0, 30.0, 19.0, 25.0, 249.0]
2025-09-12 08:15:33,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 8 hours, 10 minutes, 13 seconds)
2025-09-12 08:27:04,578 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:27:04,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:28:25,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 925.02509 ± 502.740
2025-09-12 08:28:25,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1197.4398, 1096.8342, 963.48083, 1706.1816, 1005.089, 4.754261, 971.75055, 1159.3767, 1144.2224, 1.1208854]
2025-09-12 08:28:25,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [380.0, 348.0, 305.0, 516.0, 299.0, 16.0, 322.0, 360.0, 398.0, 18.0]
2025-09-12 08:28:25,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 8 hours, 11 seconds)
2025-09-12 08:40:35,909 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:40:35,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:42:46,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1449.70972 ± 571.755
2025-09-12 08:42:46,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1309.8153, 1115.3802, 2899.3723, 1475.0667, 1101.208, 758.4804, 1680.0833, 1803.8269, 943.4966, 1410.3671]
2025-09-12 08:42:46,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [384.0, 356.0, 1000.0, 443.0, 344.0, 220.0, 556.0, 558.0, 324.0, 469.0]
2025-09-12 08:42:46,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (1449.71) for latency ExtremeClogL1U23
2025-09-12 08:42:46,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 7 hours, 59 minutes, 50 seconds)
2025-09-12 08:55:09,110 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:55:09,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:57:11,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1349.02808 ± 776.368
2025-09-12 08:57:11,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1116.993, 981.18976, 1841.6868, 12.324232, 1055.2273, 2211.76, 1857.3538, 2802.3977, 898.06146, 713.28577]
2025-09-12 08:57:11,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [384.0, 325.0, 604.0, 40.0, 343.0, 669.0, 591.0, 902.0, 302.0, 241.0]
2025-09-12 08:57:11,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 7 hours, 57 minutes, 4 seconds)
2025-09-12 09:08:34,000 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:08:34,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:10:06,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1064.74329 ± 583.422
2025-09-12 09:10:06,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [820.04193, 2619.8657, 648.30585, 1597.3945, 1081.7382, 702.85297, 646.25867, 742.5272, 898.04205, 890.40533]
2025-09-12 09:10:06,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [252.0, 793.0, 213.0, 487.0, 323.0, 224.0, 230.0, 259.0, 268.0, 279.0]
2025-09-12 09:10:06,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 47 minutes, 48 seconds)
2025-09-12 09:22:54,019 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:22:54,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:25:36,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1751.49768 ± 585.213
2025-09-12 09:25:36,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1992.6542, 1741.2529, 741.6425, 2665.1116, 1411.7806, 1250.4386, 1231.3674, 2008.086, 1824.5944, 2648.0483]
2025-09-12 09:25:36,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [654.0, 552.0, 243.0, 1000.0, 439.0, 412.0, 379.0, 657.0, 617.0, 873.0]
2025-09-12 09:25:36,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (1751.50) for latency ExtremeClogL1U23
2025-09-12 09:25:36,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 7 hours, 42 minutes, 21 seconds)
2025-09-12 09:36:37,176 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:36:37,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:39:24,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1932.32544 ± 952.363
2025-09-12 09:39:24,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1173.1196, 1318.724, 781.7768, 3130.17, 2139.8142, 2892.1003, 3470.3296, 1579.7686, 2248.4587, 588.9927]
2025-09-12 09:39:24,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [371.0, 418.0, 254.0, 965.0, 664.0, 901.0, 1000.0, 501.0, 708.0, 203.0]
2025-09-12 09:39:24,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (1932.33) for latency ExtremeClogL1U23
2025-09-12 09:39:24,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 7 hours, 34 minutes, 16 seconds)
2025-09-12 09:51:32,120 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:51:32,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:54:52,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2194.81567 ± 667.136
2025-09-12 09:54:52,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [2280.7651, 2959.2437, 2468.27, 2324.2126, 845.7597, 2692.74, 2561.1113, 1361.8983, 1567.698, 2886.4595]
2025-09-12 09:54:52,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [724.0, 851.0, 747.0, 709.0, 285.0, 848.0, 1000.0, 468.0, 483.0, 1000.0]
2025-09-12 09:54:52,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (2194.82) for latency ExtremeClogL1U23
2025-09-12 09:54:52,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 7 hours, 26 minutes, 57 seconds)
2025-09-12 10:06:52,185 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:06:52,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:09:21,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1637.43726 ± 969.561
2025-09-12 10:09:21,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1810.491, 1871.904, 2995.75, 919.7518, 2782.9158, 2822.9468, 1351.0542, -1.1164304, 1365.1438, 455.53262]
2025-09-12 10:09:21,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [541.0, 619.0, 1000.0, 296.0, 860.0, 1000.0, 405.0, 13.0, 487.0, 178.0]
2025-09-12 10:09:21,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 13 minutes)
2025-09-12 10:21:28,750 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:21:28,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:23:26,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1345.05750 ± 422.635
2025-09-12 10:23:26,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [795.32715, 1945.995, 1912.5741, 1209.7609, 963.6389, 878.0481, 1405.7219, 1215.9039, 1952.0579, 1171.5466]
2025-09-12 10:23:26,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [267.0, 570.0, 589.0, 401.0, 289.0, 288.0, 435.0, 390.0, 591.0, 382.0]
2025-09-12 10:23:26,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 5 minutes, 20 seconds)
2025-09-12 10:35:49,319 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:35:49,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:38:46,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2013.14648 ± 844.834
2025-09-12 10:38:46,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [2865.0347, 3166.0215, 1185.728, 1381.245, 1412.743, 2837.551, 1230.3652, 3156.0256, 1000.207, 1896.544]
2025-09-12 10:38:46,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 955.0, 381.0, 433.0, 425.0, 897.0, 384.0, 1000.0, 308.0, 597.0]
2025-09-12 10:38:46,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 49 minutes, 43 seconds)
2025-09-12 10:50:53,180 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:50:53,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:53:27,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1769.97388 ± 755.988
2025-09-12 10:53:27,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1591.903, 2716.7065, 1947.792, 1001.02716, 2472.7017, 1516.0114, 2075.876, 2336.0508, 2038.0626, 3.6074653]
2025-09-12 10:53:27,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [492.0, 822.0, 617.0, 355.0, 744.0, 452.0, 661.0, 790.0, 609.0, 18.0]
2025-09-12 10:53:27,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 39 minutes, 51 seconds)
2025-09-12 11:05:32,233 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:05:32,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:08:46,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2174.55713 ± 883.369
2025-09-12 11:08:46,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1066.6294, 1663.3568, 1265.0415, 3041.747, 1786.0015, 3213.7012, 878.6374, 2951.3953, 2779.3826, 3099.679]
2025-09-12 11:08:46,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [364.0, 529.0, 390.0, 1000.0, 549.0, 1000.0, 305.0, 1000.0, 898.0, 987.0]
2025-09-12 11:08:46,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 24 minutes, 18 seconds)
2025-09-12 11:20:52,328 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:20:52,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:23:55,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2136.86670 ± 863.396
2025-09-12 11:23:55,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [640.4475, 2422.4614, 1543.4879, 2541.968, 1187.8359, 3160.1194, 1275.9052, 3162.7095, 2397.9841, 3035.746]
2025-09-12 11:23:55,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [221.0, 753.0, 488.0, 771.0, 380.0, 1000.0, 391.0, 1000.0, 729.0, 919.0]
2025-09-12 11:23:55,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 12 minutes, 50 seconds)
2025-09-12 11:36:04,524 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:36:04,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:39:37,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2483.38037 ± 574.299
2025-09-12 11:39:37,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3268.8892, 2810.4246, 1565.1694, 2084.0215, 2335.956, 2918.3423, 3314.0774, 2015.205, 1841.0892, 2680.6313]
2025-09-12 11:39:37,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 864.0, 501.0, 626.0, 730.0, 941.0, 1000.0, 603.0, 560.0, 858.0]
2025-09-12 11:39:37,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (2483.38) for latency ExtremeClogL1U23
2025-09-12 11:39:37,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 5 minutes, 39 seconds)
2025-09-12 11:52:15,576 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:52:15,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:54:36,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1587.17542 ± 987.260
2025-09-12 11:54:36,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [870.20795, 3.4886372, 586.6994, 994.61066, 2908.5068, 2164.1309, 3141.521, 1748.3524, 2364.5784, 1089.6575]
2025-09-12 11:54:36,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [307.0, 21.0, 215.0, 314.0, 882.0, 693.0, 1000.0, 520.0, 739.0, 359.0]
2025-09-12 11:54:36,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 48 minutes, 49 seconds)
2025-09-12 12:05:59,771 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:05:59,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:09:26,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2471.97974 ± 622.332
2025-09-12 12:09:26,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3314.2944, 2158.065, 2131.3684, 2476.4998, 1319.8075, 1989.3013, 2418.5198, 2317.8137, 3285.5432, 3308.585]
2025-09-12 12:09:26,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 673.0, 607.0, 742.0, 407.0, 581.0, 695.0, 720.0, 1000.0, 1000.0]
2025-09-12 12:09:26,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 34 minutes, 20 seconds)
2025-09-12 12:21:18,632 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:21:18,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:24:44,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2243.02930 ± 820.592
2025-09-12 12:24:44,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3063.99, 1744.4254, 1974.3964, 2812.033, 2074.8037, 745.61884, 3300.7158, 1107.0724, 2572.277, 3034.961]
2025-09-12 12:24:44,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 573.0, 622.0, 1000.0, 649.0, 292.0, 1000.0, 344.0, 815.0, 1000.0]
2025-09-12 12:24:44,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 19 minutes, 2 seconds)
2025-09-12 12:37:16,037 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:37:16,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:39:34,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1606.43872 ± 1099.244
2025-09-12 12:39:34,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1119.7942, 3385.7188, 1276.0363, 1149.4747, 1619.6348, 2104.6636, 2170.9502, 4.563767, 3233.7864, -0.23489161]
2025-09-12 12:39:34,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [352.0, 1000.0, 395.0, 352.0, 469.0, 622.0, 667.0, 21.0, 992.0, 19.0]
2025-09-12 12:39:34,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 2 minutes, 32 seconds)
2025-09-12 12:52:01,294 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:52:01,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:54:17,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1531.20776 ± 839.839
2025-09-12 12:54:17,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1214.8984, 1963.5874, 701.3578, -0.846623, 947.73254, 2288.053, 1331.9031, 2482.4282, 1484.1208, 2898.8428]
2025-09-12 12:54:17,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [369.0, 594.0, 281.0, 16.0, 336.0, 694.0, 421.0, 730.0, 457.0, 912.0]
2025-09-12 12:54:17,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 43 minutes, 43 seconds)
2025-09-12 13:05:40,428 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:05:40,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:08:23,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1973.46289 ± 1047.925
2025-09-12 13:08:23,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3.033085, 3501.0122, 2478.9026, 2685.9956, 1192.9752, 3127.102, 2813.3032, 1416.7762, 1151.5356, 1363.9934]
2025-09-12 13:08:23,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 1000.0, 718.0, 780.0, 370.0, 1000.0, 804.0, 426.0, 345.0, 410.0]
2025-09-12 13:08:23,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 25 minutes, 38 seconds)
2025-09-12 13:21:24,710 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:21:24,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:24:07,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1892.18481 ± 686.020
2025-09-12 13:24:07,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1920.0819, 1531.5604, 1925.0541, 1417.3109, 754.4432, 2317.562, 3134.4365, 2682.367, 2147.312, 1091.7188]
2025-09-12 13:24:07,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [565.0, 462.0, 590.0, 447.0, 251.0, 720.0, 931.0, 811.0, 693.0, 327.0]
2025-09-12 13:24:07,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 13 minutes, 56 seconds)
2025-09-12 13:35:23,068 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:35:23,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:38:51,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2396.42749 ± 1096.827
2025-09-12 13:38:51,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3187.29, 3290.2212, 3239.9692, 1869.0555, 6.121751, 1292.245, 1520.9587, 3055.3665, 3150.3638, 3352.6833]
2025-09-12 13:38:51,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [936.0, 1000.0, 1000.0, 582.0, 17.0, 424.0, 450.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:38:51,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 57 minutes, 11 seconds)
2025-09-12 13:51:08,035 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:51:08,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:54:15,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2259.21338 ± 921.294
2025-09-12 13:54:15,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1484.4594, 1428.2893, 1570.8523, 3339.0757, 3304.1624, 1375.4354, 3312.1558, 2364.033, 3337.9995, 1075.671]
2025-09-12 13:54:15,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [450.0, 457.0, 447.0, 1000.0, 1000.0, 423.0, 1000.0, 661.0, 1000.0, 323.0]
2025-09-12 13:54:15,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 44 minutes, 4 seconds)
2025-09-12 14:06:12,035 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:06:12,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:10:23,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2934.83960 ± 666.794
2025-09-12 14:10:23,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3385.583, 1700.3575, 3128.851, 3162.194, 3342.7258, 3249.18, 3426.3892, 3270.582, 3149.201, 1533.3314]
2025-09-12 14:10:23,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 560.0, 914.0, 1000.0, 984.0, 1000.0, 986.0, 1000.0, 1000.0, 499.0]
2025-09-12 14:10:23,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (2934.84) for latency ExtremeClogL1U23
2025-09-12 14:10:23,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 33 minutes, 4 seconds)
2025-09-12 14:23:38,184 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:23:38,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:26:08,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1702.91602 ± 1110.482
2025-09-12 14:26:08,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1378.6897, 3238.985, 1071.2142, 6.1323843, 3121.9473, 2.6378324, 1618.3848, 2743.8767, 1425.5635, 2421.7285]
2025-09-12 14:26:08,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [427.0, 1000.0, 358.0, 21.0, 1000.0, 14.0, 498.0, 848.0, 472.0, 728.0]
2025-09-12 14:26:08,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 22 minutes, 9 seconds)
2025-09-12 14:37:10,390 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:37:10,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:39:56,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 1954.32849 ± 542.101
2025-09-12 14:39:56,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [2287.283, 1633.1958, 2108.411, 1542.6067, 3336.0447, 1989.7622, 1926.4237, 1876.2188, 1574.5951, 1268.7444]
2025-09-12 14:39:56,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [687.0, 502.0, 642.0, 485.0, 1000.0, 603.0, 561.0, 573.0, 511.0, 391.0]
2025-09-12 14:39:56,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 1 minute, 56 seconds)
2025-09-12 14:52:44,305 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:52:44,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:56:54,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2917.40869 ± 636.165
2025-09-12 14:56:54,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3295.4092, 2971.2278, 2318.5732, 3248.0068, 3377.437, 2400.0034, 3299.048, 1405.9535, 3431.7053, 3426.7239]
2025-09-12 14:56:54,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 749.0, 1000.0, 1000.0, 777.0, 1000.0, 423.0, 1000.0, 1000.0]
2025-09-12 14:56:54,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 51 minutes, 42 seconds)
2025-09-12 15:08:26,573 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:08:26,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:11:27,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2093.50659 ± 1100.345
2025-09-12 15:11:27,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3252.7124, 2226.2034, 2976.7717, 7.543002, 1623.3844, 3474.5925, 3180.7378, 774.685, 2076.4282, 1342.0082]
2025-09-12 15:11:27,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 679.0, 868.0, 19.0, 544.0, 1000.0, 982.0, 255.0, 636.0, 401.0]
2025-09-12 15:11:27,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 34 minutes, 23 seconds)
2025-09-12 15:23:34,379 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:23:34,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:27:27,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2711.00195 ± 816.387
2025-09-12 15:27:27,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [2601.9868, 3280.6692, 970.4344, 3251.9902, 3248.551, 3391.5117, 3327.2253, 3331.9038, 1762.9761, 1942.7703]
2025-09-12 15:27:27,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [818.0, 1000.0, 323.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 544.0, 605.0]
2025-09-12 15:27:27,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 18 minutes, 43 seconds)
2025-09-12 15:40:24,280 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:40:24,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:43:37,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2309.29468 ± 913.748
2025-09-12 15:43:37,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3158.86, 1082.4366, 2093.3118, 3416.9739, 2139.9512, 3254.1812, 2563.9465, 3218.4138, 1195.626, 969.24884]
2025-09-12 15:43:37,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [904.0, 347.0, 649.0, 1000.0, 646.0, 1000.0, 735.0, 954.0, 376.0, 322.0]
2025-09-12 15:43:37,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 3 minutes, 57 seconds)
2025-09-12 15:55:20,007 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:55:20,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:58:31,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2239.90088 ± 1370.388
2025-09-12 15:58:31,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1292.8438, 1248.0569, 2.5552714, 3287.556, 2.3737402, 3300.823, 3426.0237, 3260.9294, 3370.4268, 3207.4207]
2025-09-12 15:58:31,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [397.0, 426.0, 23.0, 1000.0, 13.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:58:31,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 50 minutes, 1 second)
2025-09-12 16:10:36,896 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:10:36,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:14:14,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2546.88037 ± 1099.985
2025-09-12 16:14:14,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3212.686, 1645.988, 1408.4492, 5.0713935, 3444.8362, 3290.804, 2544.8042, 3343.4648, 3289.4705, 3283.2292]
2025-09-12 16:14:14,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 503.0, 462.0, 15.0, 1000.0, 1000.0, 762.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:14:14,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 32 minutes, 48 seconds)
2025-09-12 16:26:50,196 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:26:50,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:30:42,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2786.95557 ± 652.783
2025-09-12 16:30:42,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1647.8483, 2586.8262, 3447.1306, 3309.7783, 3294.7725, 3095.398, 1715.5068, 2274.847, 3359.924, 3137.5225]
2025-09-12 16:30:42,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [482.0, 751.0, 1000.0, 1000.0, 1000.0, 891.0, 497.0, 683.0, 1000.0, 1000.0]
2025-09-12 16:30:42,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 19 minutes, 15 seconds)
2025-09-12 16:42:59,669 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:42:59,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:45:54,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2052.87378 ± 1152.853
2025-09-12 16:45:54,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3328.611, 758.63275, 1441.6194, 3285.8232, 3381.562, 1824.539, 1019.0064, 261.34393, 1786.4413, 3441.161]
2025-09-12 16:45:54,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 237.0, 418.0, 1000.0, 1000.0, 517.0, 333.0, 216.0, 515.0, 1000.0]
2025-09-12 16:45:54,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 2 minutes, 46 seconds)
2025-09-12 16:57:49,712 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:57:49,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:01:31,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2647.19702 ± 981.773
2025-09-12 17:01:31,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [2935.7566, 3526.4595, 876.85187, 3393.435, 3340.8274, 3374.0059, 1423.7795, 2919.5427, 1262.8827, 3418.4304]
2025-09-12 17:01:31,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [886.0, 1000.0, 283.0, 1000.0, 1000.0, 1000.0, 493.0, 853.0, 379.0, 1000.0]
2025-09-12 17:01:31,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 46 minutes, 44 seconds)
2025-09-12 17:13:30,874 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:13:30,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:18:02,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 3191.97778 ± 279.460
2025-09-12 17:18:02,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3478.9858, 3187.4165, 3306.9216, 3193.511, 3153.4429, 3314.2773, 3266.713, 3318.5056, 3304.6653, 2395.3374]
2025-09-12 17:18:02,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 964.0, 1000.0, 1000.0, 973.0, 1000.0, 1000.0, 1000.0, 1000.0, 691.0]
2025-09-12 17:18:02,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (3191.98) for latency ExtremeClogL1U23
2025-09-12 17:18:02,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 31 minutes, 48 seconds)
2025-09-12 17:30:52,134 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:30:52,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:35:27,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 3199.49243 ± 231.867
2025-09-12 17:35:27,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [3414.4531, 3334.2822, 3289.245, 3379.6626, 2894.056, 3108.1768, 3308.2341, 3297.5115, 2654.99, 3314.313]
2025-09-12 17:35:27,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 787.0, 1000.0]
2025-09-12 17:35:27,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1226 [INFO]: New best (3199.49) for latency ExtremeClogL1U23
2025-09-12 17:35:27,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 16 minutes, 14 seconds)
2025-09-12 17:47:15,623 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:47:15,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:50:56,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1221 [DEBUG]: Total Reward: 2579.17798 ± 1087.599
2025-09-12 17:50:56,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1222 [DEBUG]: All rewards: [1780.4092, 3272.2542, 2785.9795, 3216.2063, 1187.5464, 3402.3562, 3253.9783, 155.55734, 3179.8423, 3557.6506]
2025-09-12 17:50:56,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1223 [DEBUG]: All trajectory lengths: [567.0, 1000.0, 798.0, 1000.0, 363.0, 1000.0, 1000.0, 112.0, 1000.0, 1000.0]
2025-09-12 17:50:56,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-walker2d):1251 [DEBUG]: Training session finished
