2025-09-11 19:32:34,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc0-walker2d/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 19:32:34,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc0-walker2d/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 19:32:34,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14a7c9690dd0>}
2025-09-11 19:32:34,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1111 [DEBUG]: using device: cuda
2025-09-11 19:32:34,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1133 [INFO]: Creating new trainer
2025-09-11 19:32:34,469 baseline-mbpac-noiseperc0-walker2d:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-11 19:32:34,469 baseline-mbpac-noiseperc0-walker2d:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 19:32:34,477 baseline-mbpac-noiseperc0-walker2d:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-11 19:32:35,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1194 [DEBUG]: Starting training session...
2025-09-11 19:32:35,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 1/100
2025-09-11 19:42:37,355 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:42:37,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:43:34,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 187.86972 ± 127.937
2025-09-11 19:43:34,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [395.86948, 111.59676, 317.0824, 64.78844, 75.19326, 112.61941, 66.68116, 356.05908, 85.92689, 292.8803]
2025-09-11 19:43:34,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [286.0, 223.0, 198.0, 174.0, 187.0, 228.0, 188.0, 228.0, 198.0, 169.0]
2025-09-11 19:43:34,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (187.87) for latency ExtremeClogL1U23
2025-09-11 19:43:34,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 18 hours, 7 minutes, 37 seconds)
2025-09-11 19:55:07,236 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:55:07,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:55:45,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 92.96309 ± 90.085
2025-09-11 19:55:45,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [101.31838, 166.31279, 61.05006, 66.269875, 15.388479, 92.47784, 75.34773, 326.575, 5.295774, 19.594986]
2025-09-11 19:55:45,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [165.0, 135.0, 133.0, 206.0, 43.0, 86.0, 99.0, 182.0, 162.0, 191.0]
2025-09-11 19:55:45,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 18 hours, 55 minutes, 17 seconds)
2025-09-11 20:07:20,655 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:07:20,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:08:12,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 220.18784 ± 78.724
2025-09-11 20:08:12,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [274.85376, 315.6175, 221.7316, 295.57285, 58.259663, 250.7045, 108.31807, 201.64175, 195.29143, 279.8872]
2025-09-11 20:08:12,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [189.0, 165.0, 113.0, 207.0, 157.0, 133.0, 228.0, 145.0, 350.0, 175.0]
2025-09-11 20:08:12,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (220.19) for latency ExtremeClogL1U23
2025-09-11 20:08:12,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 19 hours, 11 minutes, 30 seconds)
2025-09-11 20:19:51,314 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:19:51,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:20:35,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 310.56238 ± 77.876
2025-09-11 20:20:35,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [301.60083, 254.59424, 319.69324, 379.7239, 365.6461, 227.64871, 182.51424, 467.39062, 331.37833, 275.43338]
2025-09-11 20:20:35,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [169.0, 143.0, 163.0, 176.0, 205.0, 123.0, 96.0, 248.0, 147.0, 137.0]
2025-09-11 20:20:35,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (310.56) for latency ExtremeClogL1U23
2025-09-11 20:20:35,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 19 hours, 12 minutes, 5 seconds)
2025-09-11 20:32:16,863 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:32:16,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:33:10,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 305.99969 ± 115.458
2025-09-11 20:33:10,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [148.65427, 232.33418, 194.5906, 333.1241, 508.2116, 235.14284, 323.71497, 267.2816, 511.54562, 305.3973]
2025-09-11 20:33:10,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [113.0, 388.0, 127.0, 178.0, 265.0, 133.0, 161.0, 147.0, 219.0, 225.0]
2025-09-11 20:33:10,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 19 hours, 11 minutes, 6 seconds)
2025-09-11 20:44:34,586 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:44:34,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:45:20,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 328.57632 ± 79.062
2025-09-11 20:45:20,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [336.03775, 336.0956, 468.10233, 299.13425, 333.57657, 454.0455, 320.99182, 228.05957, 203.5577, 306.1621]
2025-09-11 20:45:20,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [176.0, 159.0, 220.0, 161.0, 159.0, 222.0, 158.0, 115.0, 119.0, 162.0]
2025-09-11 20:45:20,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (328.58) for latency ExtremeClogL1U23
2025-09-11 20:45:20,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 19 hours, 21 minutes, 4 seconds)
2025-09-11 20:57:04,667 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:57:04,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:58:00,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 415.14569 ± 154.929
2025-09-11 20:58:00,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [690.23035, 63.28248, 424.44418, 442.19168, 414.87665, 430.64267, 506.8253, 312.9405, 533.01843, 333.00513]
2025-09-11 20:58:00,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [277.0, 135.0, 192.0, 204.0, 202.0, 198.0, 229.0, 168.0, 212.0, 218.0]
2025-09-11 20:58:00,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (415.15) for latency ExtremeClogL1U23
2025-09-11 20:58:00,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 19 hours, 17 minutes, 53 seconds)
2025-09-11 21:09:47,851 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:09:47,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:10:45,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 391.49234 ± 132.034
2025-09-11 21:10:45,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [518.9775, 297.19662, 516.5119, 464.15796, 443.06183, 411.22495, 378.7247, 458.54724, 44.38918, 382.1313]
2025-09-11 21:10:45,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [269.0, 159.0, 286.0, 229.0, 215.0, 227.0, 191.0, 235.0, 125.0, 151.0]
2025-09-11 21:10:45,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 19 hours, 10 minutes, 58 seconds)
2025-09-11 21:22:00,347 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:22:00,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:22:49,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 336.89801 ± 96.805
2025-09-11 21:22:49,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [381.8156, 308.04773, 462.2883, 289.01968, 435.6612, 384.1582, 329.84177, 380.1136, 96.74222, 301.29166]
2025-09-11 21:22:49,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [210.0, 148.0, 214.0, 130.0, 234.0, 213.0, 177.0, 167.0, 139.0, 136.0]
2025-09-11 21:22:49,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 18 hours, 52 minutes, 36 seconds)
2025-09-11 21:34:32,076 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:34:32,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:35:27,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 413.60785 ± 49.847
2025-09-11 21:35:27,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [326.24557, 398.75848, 362.10083, 366.69217, 462.0526, 405.4636, 489.42776, 410.5763, 465.67947, 449.0818]
2025-09-11 21:35:27,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [159.0, 181.0, 167.0, 146.0, 243.0, 172.0, 224.0, 194.0, 242.0, 236.0]
2025-09-11 21:35:27,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 18 hours, 41 minutes)
2025-09-11 21:46:58,626 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:46:58,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:47:47,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 368.46707 ± 87.307
2025-09-11 21:47:47,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [471.08127, 485.35812, 347.74976, 268.35907, 420.23257, 305.6253, 331.96072, 295.55017, 260.9184, 497.8352]
2025-09-11 21:47:47,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [223.0, 215.0, 163.0, 144.0, 206.0, 145.0, 138.0, 147.0, 123.0, 272.0]
2025-09-11 21:47:47,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 18 hours, 31 minutes, 47 seconds)
2025-09-11 21:59:31,629 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:59:31,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:00:34,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 510.89642 ± 132.056
2025-09-11 22:00:34,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [410.50708, 601.5588, 533.12085, 526.8298, 316.96445, 658.15765, 468.81705, 326.2249, 756.1958, 510.58768]
2025-09-11 22:00:34,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [202.0, 258.0, 207.0, 227.0, 139.0, 275.0, 241.0, 159.0, 307.0, 296.0]
2025-09-11 22:00:34,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (510.90) for latency ExtremeClogL1U23
2025-09-11 22:00:34,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 18 hours, 21 minutes, 9 seconds)
2025-09-11 22:11:45,327 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:11:45,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:13:00,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 673.35193 ± 347.319
2025-09-11 22:13:00,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1357.7252, 727.41174, 711.5245, 310.6056, 329.8371, 397.23618, 290.1406, 1032.3896, 558.61523, 1018.03326]
2025-09-11 22:13:00,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [510.0, 294.0, 277.0, 144.0, 155.0, 222.0, 136.0, 422.0, 238.0, 379.0]
2025-09-11 22:13:00,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (673.35) for latency ExtremeClogL1U23
2025-09-11 22:13:00,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 18 hours, 3 minutes, 19 seconds)
2025-09-11 22:24:37,108 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:24:37,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:25:26,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 435.66885 ± 104.161
2025-09-11 22:25:26,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [382.85718, 438.73972, 420.66098, 438.40128, 393.56055, 484.91287, 487.75494, 659.7215, 215.78038, 434.2993]
2025-09-11 22:25:26,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [152.0, 187.0, 190.0, 200.0, 166.0, 191.0, 184.0, 251.0, 147.0, 166.0]
2025-09-11 22:25:26,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 17 hours, 56 minutes, 53 seconds)
2025-09-11 22:37:16,428 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:37:16,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:38:24,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 489.15274 ± 99.859
2025-09-11 22:38:24,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [443.1587, 504.83676, 713.94556, 455.50793, 493.04807, 436.2739, 571.729, 523.8197, 308.95895, 440.24878]
2025-09-11 22:38:24,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [190.0, 217.0, 636.0, 167.0, 215.0, 225.0, 243.0, 259.0, 139.0, 202.0]
2025-09-11 22:38:24,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 17 hours, 50 minutes, 6 seconds)
2025-09-11 22:49:11,283 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:49:11,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:50:33,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 690.13477 ± 224.181
2025-09-11 22:50:33,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [755.69275, 801.0972, 424.41318, 715.2394, 477.54535, 800.8477, 613.1589, 1204.2183, 713.1637, 395.97095]
2025-09-11 22:50:33,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [323.0, 323.0, 177.0, 307.0, 202.0, 325.0, 286.0, 508.0, 356.0, 188.0]
2025-09-11 22:50:33,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (690.13) for latency ExtremeClogL1U23
2025-09-11 22:50:33,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 17 hours, 34 minutes, 25 seconds)
2025-09-11 23:02:07,781 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:02:07,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:03:40,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 924.28210 ± 407.556
2025-09-11 23:03:40,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [561.08966, 1328.2803, 795.1633, 1284.8745, 1085.51, 383.93643, 440.71497, 809.0437, 827.02295, 1727.1853]
2025-09-11 23:03:40,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [240.0, 509.0, 314.0, 526.0, 373.0, 153.0, 161.0, 293.0, 308.0, 563.0]
2025-09-11 23:03:40,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (924.28) for latency ExtremeClogL1U23
2025-09-11 23:03:40,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 17 hours, 27 minutes, 30 seconds)
2025-09-11 23:14:46,959 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:14:46,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:17:22,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1462.11328 ± 693.527
2025-09-11 23:17:22,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2665.989, 2442.669, 737.87555, 2087.209, 1631.3417, 760.743, 1178.2913, 699.6986, 914.6719, 1502.6431]
2025-09-11 23:17:22,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 260.0, 825.0, 667.0, 307.0, 436.0, 245.0, 375.0, 572.0]
2025-09-11 23:17:22,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (1462.11) for latency ExtremeClogL1U23
2025-09-11 23:17:22,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 17 hours, 35 minutes, 35 seconds)
2025-09-11 23:29:23,964 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:29:23,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:32:27,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1776.46521 ± 715.560
2025-09-11 23:32:27,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2749.6643, 2249.3997, 2182.973, 1031.1979, 2708.7317, 1037.8037, 2339.0957, 1427.2668, 1357.532, 680.9892]
2025-09-11 23:32:27,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 805.0, 344.0, 1000.0, 355.0, 1000.0, 452.0, 544.0, 246.0]
2025-09-11 23:32:27,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (1776.47) for latency ExtremeClogL1U23
2025-09-11 23:32:27,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 18 hours, 5 minutes, 45 seconds)
2025-09-11 23:43:22,456 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:43:22,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:46:06,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1641.19556 ± 790.438
2025-09-11 23:46:06,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2713.9937, 675.293, 2629.3284, 2616.0103, 1831.2542, 839.77673, 1360.0496, 580.62524, 2000.1407, 1165.4841]
2025-09-11 23:46:06,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 313.0, 919.0, 1000.0, 666.0, 304.0, 454.0, 247.0, 690.0, 405.0]
2025-09-11 23:46:06,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 18 hours, 3 minutes, 23 seconds)
2025-09-11 23:57:24,840 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:57:24,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:59:50,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1573.58398 ± 553.586
2025-09-11 23:59:50,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2820.3936, 1198.6014, 1492.6516, 1617.264, 1906.4276, 1966.8683, 1176.4736, 1683.6749, 1176.0782, 697.40625]
2025-09-11 23:59:50,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 440.0, 465.0, 492.0, 623.0, 634.0, 389.0, 566.0, 409.0, 328.0]
2025-09-11 23:59:50,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 18 hours, 14 minutes, 39 seconds)
2025-09-12 00:11:58,830 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:11:58,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:15:52,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2436.29126 ± 801.255
2025-09-12 00:15:52,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2818.4702, 599.2919, 2884.3125, 2728.9507, 1104.0974, 2833.3896, 2868.863, 2832.934, 2843.902, 2848.7017]
2025-09-12 00:15:52,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 205.0, 1000.0, 1000.0, 414.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:15:52,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (2436.29) for latency ExtremeClogL1U23
2025-09-12 00:15:52,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 18 hours, 46 minutes, 18 seconds)
2025-09-12 00:26:38,023 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:26:38,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:30:06,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2031.96423 ± 895.867
2025-09-12 00:30:06,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2625.3853, 1323.3629, 2781.833, 473.81052, 2641.4724, 2884.5198, 1149.6523, 2900.0627, 2610.4463, 929.09924]
2025-09-12 00:30:06,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 544.0, 1000.0, 217.0, 1000.0, 1000.0, 430.0, 1000.0, 1000.0, 391.0]
2025-09-12 00:30:06,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 18 hours, 39 minutes, 53 seconds)
2025-09-12 00:42:08,412 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:42:08,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:46:06,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2619.67017 ± 620.940
2025-09-12 00:46:06,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2938.3115, 2905.578, 3014.6863, 2896.355, 1957.9607, 3105.9934, 2987.2646, 1063.3854, 2983.2837, 2343.8828]
2025-09-12 00:46:06,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 656.0, 1000.0, 1000.0, 370.0, 1000.0, 787.0]
2025-09-12 00:46:06,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (2619.67) for latency ExtremeClogL1U23
2025-09-12 00:46:06,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 18 hours, 39 minutes, 37 seconds)
2025-09-12 00:56:45,125 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:56:45,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:59:39,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1891.31128 ± 1000.882
2025-09-12 00:59:39,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1119.295, 24.173834, 1822.6688, 2777.767, 1334.2909, 3100.298, 2997.6304, 1117.8231, 1534.7023, 3084.4631]
2025-09-12 00:59:39,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [389.0, 32.0, 573.0, 1000.0, 453.0, 1000.0, 1000.0, 423.0, 501.0, 1000.0]
2025-09-12 00:59:39,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 18 hours, 23 minutes, 9 seconds)
2025-09-12 01:11:15,349 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:11:15,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:14:35,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2246.56396 ± 1188.018
2025-09-12 01:14:35,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3255.9026, 316.2605, 3055.6023, 3232.975, 50.38091, 1902.2694, 3054.063, 1408.5109, 3150.5298, 3039.1438]
2025-09-12 01:14:35,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [998.0, 169.0, 1000.0, 1000.0, 63.0, 666.0, 1000.0, 469.0, 1000.0, 1000.0]
2025-09-12 01:14:35,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 18 hours, 26 minutes, 24 seconds)
2025-09-12 01:26:31,356 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:26:31,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:30:22,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2868.15308 ± 738.100
2025-09-12 01:30:22,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3097.78, 1250.0424, 3536.03, 3375.7192, 2466.1248, 3291.1455, 3372.1113, 3212.106, 1783.4899, 3296.9824]
2025-09-12 01:30:22,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [844.0, 409.0, 1000.0, 975.0, 739.0, 1000.0, 952.0, 939.0, 549.0, 1000.0]
2025-09-12 01:30:22,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (2868.15) for latency ExtremeClogL1U23
2025-09-12 01:30:22,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 18 hours, 7 minutes, 35 seconds)
2025-09-12 01:41:20,889 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:41:20,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:44:25,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2335.92114 ± 964.230
2025-09-12 01:44:25,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [727.437, 3330.0881, 3056.3088, 1013.26495, 3237.1157, 1277.5394, 2690.713, 2674.0078, 3400.6284, 1952.1064]
2025-09-12 01:44:25,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [267.0, 1000.0, 883.0, 320.0, 913.0, 402.0, 772.0, 762.0, 1000.0, 593.0]
2025-09-12 01:44:25,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 17 hours, 50 minutes, 13 seconds)
2025-09-12 01:55:41,648 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:55:41,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:59:46,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3097.15479 ± 551.805
2025-09-12 01:59:46,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2397.9807, 3492.6265, 3227.1338, 3135.1396, 3308.2654, 1732.588, 3294.472, 3383.016, 3379.2913, 3621.036]
2025-09-12 01:59:46,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [702.0, 1000.0, 1000.0, 1000.0, 1000.0, 509.0, 1000.0, 1000.0, 991.0, 1000.0]
2025-09-12 01:59:46,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (3097.15) for latency ExtremeClogL1U23
2025-09-12 01:59:46,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 17 hours, 26 minutes, 4 seconds)
2025-09-12 02:11:03,495 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:11:03,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:13:40,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1882.88159 ± 1510.854
2025-09-12 02:13:40,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1960.7219, 3374.7234, 3271.0908, 3289.9917, 355.27887, 21.436653, 3212.203, 5.314359, 9.858867, 3328.198]
2025-09-12 02:13:40,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [614.0, 1000.0, 1000.0, 1000.0, 163.0, 33.0, 1000.0, 26.0, 27.0, 1000.0]
2025-09-12 02:13:40,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 17 hours, 16 minutes, 16 seconds)
2025-09-12 02:25:35,794 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:25:35,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:29:37,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3005.35620 ± 600.458
2025-09-12 02:29:37,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3353.4583, 2242.2102, 1486.4221, 3216.4824, 3154.5596, 3183.5247, 3355.8872, 3284.6338, 3421.3174, 3355.0654]
2025-09-12 02:29:37,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 646.0, 497.0, 911.0, 1000.0, 1000.0, 972.0, 960.0, 1000.0, 1000.0]
2025-09-12 02:29:37,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 17 hours, 15 minutes, 18 seconds)
2025-09-12 02:40:44,944 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:40:44,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:44:40,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2993.37744 ± 927.592
2025-09-12 02:44:40,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3404.1362, 3534.4658, 2463.2437, 343.27853, 3384.178, 3433.058, 3339.4277, 3304.158, 3393.398, 3334.4297]
2025-09-12 02:44:40,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 725.0, 149.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 946.0]
2025-09-12 02:44:40,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 16 hours, 50 minutes, 31 seconds)
2025-09-12 02:55:40,252 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:55:40,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:59:32,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2834.54224 ± 930.584
2025-09-12 02:59:32,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3514.3276, 3446.9739, 2607.3582, 3217.697, 3297.4805, 1511.1437, 3238.3242, 651.7549, 3533.6016, 3326.7598]
2025-09-12 02:59:32,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 798.0, 1000.0, 1000.0, 484.0, 1000.0, 234.0, 1000.0, 1000.0]
2025-09-12 02:59:32,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 16 hours, 46 minutes, 37 seconds)
2025-09-12 03:11:38,489 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:11:38,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:15:26,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2958.08594 ± 1011.077
2025-09-12 03:15:26,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3588.6147, 2653.21, 78.14307, 2911.2937, 3444.4246, 3398.477, 2905.9702, 3466.3716, 3532.078, 3602.278]
2025-09-12 03:15:26,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 772.0, 89.0, 797.0, 1000.0, 1000.0, 840.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:15:26,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 16 hours, 38 minutes, 42 seconds)
2025-09-12 03:25:57,271 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:25:57,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:29:42,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3013.63623 ± 1145.198
2025-09-12 03:29:42,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1752.5697, 3744.104, 3189.234, 3583.397, 3182.2705, 3693.133, 3648.8792, 3683.0737, 3631.3677, 28.336636]
2025-09-12 03:29:42,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [530.0, 1000.0, 906.0, 1000.0, 847.0, 1000.0, 1000.0, 1000.0, 1000.0, 50.0]
2025-09-12 03:29:42,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 16 hours, 28 minutes, 27 seconds)
2025-09-12 03:41:31,795 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:41:31,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:45:20,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2951.71680 ± 733.711
2025-09-12 03:45:20,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1558.5264, 2679.8127, 3448.5808, 3213.0984, 3710.203, 3533.887, 3553.398, 3565.9375, 2266.3894, 1987.3362]
2025-09-12 03:45:20,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [453.0, 813.0, 1000.0, 888.0, 1000.0, 1000.0, 1000.0, 1000.0, 682.0, 597.0]
2025-09-12 03:45:20,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 16 hours, 9 minutes, 9 seconds)
2025-09-12 03:56:17,065 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:56:17,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:00:02,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3049.57080 ± 1135.666
2025-09-12 04:00:02,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2416.0027, 3439.4915, 3799.027, 3687.2847, 3782.1245, 3703.7908, 3782.6418, 2131.5012, 104.85187, 3648.9917]
2025-09-12 04:00:02,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [654.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 586.0, 81.0, 1000.0]
2025-09-12 04:00:02,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 15 hours, 49 minutes, 35 seconds)
2025-09-12 04:11:51,056 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:11:51,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:15:20,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2713.40723 ± 1291.090
2025-09-12 04:15:20,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [715.82306, 3471.8894, 3584.7935, 3572.5728, 1152.8766, 3602.6228, 3436.137, 3570.7297, 409.8104, 3616.817]
2025-09-12 04:15:20,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [285.0, 1000.0, 1000.0, 1000.0, 357.0, 1000.0, 1000.0, 1000.0, 134.0, 1000.0]
2025-09-12 04:15:20,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 15 hours, 39 minutes, 55 seconds)
2025-09-12 04:26:45,648 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:26:45,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:30:56,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3244.38574 ± 974.703
2025-09-12 04:30:56,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3469.2092, 3690.4102, 3589.8794, 334.61978, 3514.3376, 3585.8467, 3639.5847, 3557.896, 3699.2588, 3362.8145]
2025-09-12 04:30:56,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 132.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:30:56,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (3244.39) for latency ExtremeClogL1U23
2025-09-12 04:30:57,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 15 hours, 21 minutes, 11 seconds)
2025-09-12 04:41:52,219 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:41:52,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:46:23,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3605.65161 ± 143.675
2025-09-12 04:46:23,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3602.3333, 3792.5618, 3693.049, 3353.2964, 3412.0093, 3525.0544, 3524.7412, 3642.1648, 3715.1318, 3796.1719]
2025-09-12 04:46:23,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:46:23,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (3605.65) for latency ExtremeClogL1U23
2025-09-12 04:46:23,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 15 hours, 20 minutes, 5 seconds)
2025-09-12 04:57:56,523 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:57:56,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:01:31,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2914.36035 ± 1145.097
2025-09-12 05:01:31,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3675.412, 1935.704, 189.76814, 3636.902, 3599.6458, 2870.4756, 3884.8174, 1936.6064, 3750.174, 3664.0972]
2025-09-12 05:01:31,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 519.0, 116.0, 1000.0, 1000.0, 754.0, 1000.0, 510.0, 1000.0, 1000.0]
2025-09-12 05:01:31,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 14 hours, 59 minutes, 1 second)
2025-09-12 05:12:46,957 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:12:46,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:17:00,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3437.68481 ± 685.696
2025-09-12 05:17:00,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3747.0266, 3622.629, 1393.2649, 3678.7595, 3769.6638, 3517.6646, 3747.5767, 3674.0125, 3658.2903, 3567.958]
2025-09-12 05:17:00,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 407.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:17:00,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 14 hours, 52 minutes, 52 seconds)
2025-09-12 05:28:45,496 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:28:45,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:32:48,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3205.04346 ± 784.219
2025-09-12 05:32:48,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3675.4807, 3610.1548, 1700.5874, 3475.6135, 3476.9932, 3720.2395, 1593.4856, 3470.3672, 3674.6157, 3652.8975]
2025-09-12 05:32:48,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 479.0, 1000.0, 1000.0, 1000.0, 470.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:32:48,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 14 hours, 43 minutes, 4 seconds)
2025-09-12 05:43:50,800 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:43:50,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:48:14,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3534.14331 ± 239.048
2025-09-12 05:48:14,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3752.3735, 3025.8315, 3602.9163, 3625.0923, 3771.7825, 3679.565, 3661.2815, 3578.7795, 3139.4702, 3504.3433]
2025-09-12 05:48:14,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 849.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 909.0, 1000.0]
2025-09-12 05:48:14,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 14 hours, 25 minutes, 34 seconds)
2025-09-12 05:59:53,101 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:59:53,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:03:05,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2536.28320 ± 1279.318
2025-09-12 06:03:05,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2416.2502, 1562.5608, 2468.709, 801.7959, 3370.127, 3702.9722, 5.8474803, 3565.146, 3720.6948, 3748.7292]
2025-09-12 06:03:05,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [671.0, 438.0, 715.0, 278.0, 1000.0, 1000.0, 19.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:03:05,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 14 hours, 3 minutes, 42 seconds)
2025-09-12 06:14:29,195 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:14:29,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:18:58,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3710.39380 ± 59.808
2025-09-12 06:18:58,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3708.3584, 3664.1, 3581.2769, 3679.8074, 3714.9915, 3783.0286, 3732.6016, 3745.0105, 3803.4692, 3691.2917]
2025-09-12 06:18:58,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:18:58,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (3710.39) for latency ExtremeClogL1U23
2025-09-12 06:18:58,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 13 hours, 56 minutes, 24 seconds)
2025-09-12 06:29:48,540 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:29:48,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:34:05,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3636.98828 ± 514.275
2025-09-12 06:34:05,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3821.1775, 3759.7285, 3531.4502, 3830.172, 3870.6992, 3846.5508, 2122.202, 3831.5398, 3874.6838, 3881.6814]
2025-09-12 06:34:05,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 602.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:34:05,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 13 hours, 36 minutes, 57 seconds)
2025-09-12 06:45:23,109 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:45:23,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:48:29,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2470.96118 ± 1305.588
2025-09-12 06:48:29,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3639.0398, 565.312, 3678.5693, 3527.0984, 3678.6267, 523.9412, 1154.0834, 2777.4673, 1493.3535, 3672.1199]
2025-09-12 06:48:29,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 222.0, 1000.0, 1000.0, 1000.0, 179.0, 389.0, 786.0, 411.0, 1000.0]
2025-09-12 06:48:29,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 13 hours, 7 minutes, 4 seconds)
2025-09-12 07:00:19,287 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:00:19,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:04:01,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3030.01123 ± 1202.898
2025-09-12 07:04:01,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3660.9656, 236.55496, 3833.3054, 2357.4248, 3719.665, 3794.55, 3587.4102, 3872.1428, 3792.2205, 1445.871]
2025-09-12 07:04:01,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 119.0, 1000.0, 638.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 460.0]
2025-09-12 07:04:01,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 12 hours, 53 minutes, 3 seconds)
2025-09-12 07:14:51,260 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:14:51,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:19:05,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3461.70239 ± 759.071
2025-09-12 07:19:05,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3745.3428, 3695.94, 3854.0596, 3252.7441, 3730.6936, 3735.0776, 3831.2178, 3758.0613, 1235.258, 3778.6306]
2025-09-12 07:19:05,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 369.0, 1000.0]
2025-09-12 07:19:05,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 12 hours, 39 minutes, 59 seconds)
2025-09-12 07:30:27,523 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:30:27,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:34:33,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3222.57617 ± 847.954
2025-09-12 07:34:33,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1726.7627, 3644.5554, 3740.5588, 3687.3513, 3687.8298, 3563.614, 3674.056, 3404.1077, 1362.713, 3734.213]
2025-09-12 07:34:33,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [509.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 478.0, 1000.0]
2025-09-12 07:34:33,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 12 hours, 20 minutes, 41 seconds)
2025-09-12 07:45:59,806 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:45:59,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:49:55,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3203.40088 ± 963.552
2025-09-12 07:49:55,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3538.0752, 3386.0344, 666.9636, 3664.542, 3741.4656, 2150.9116, 3613.6685, 3821.7441, 3717.7021, 3732.9036]
2025-09-12 07:49:55,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [936.0, 1000.0, 229.0, 1000.0, 1000.0, 589.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:49:55,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 12 hours, 8 minutes, 5 seconds)
2025-09-12 08:02:15,402 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:02:15,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:06:10,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3352.86523 ± 1097.080
2025-09-12 08:06:10,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3847.117, 4026.4478, 3908.159, 3825.331, 3974.5264, 3992.6873, 1437.5259, 3694.235, 3903.3452, 919.2777]
2025-09-12 08:06:10,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 402.0, 1000.0, 1000.0, 299.0]
2025-09-12 08:06:10,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 12 hours, 10 minutes, 19 seconds)
2025-09-12 08:16:36,123 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:16:36,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:20:38,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3293.99854 ± 872.117
2025-09-12 08:20:38,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3836.3774, 3716.4944, 3808.7622, 3589.259, 3719.3875, 3335.759, 878.6342, 2652.8845, 3597.0776, 3805.3474]
2025-09-12 08:20:38,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 896.0, 289.0, 709.0, 1000.0, 1000.0]
2025-09-12 08:20:38,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 11 hours, 44 minutes, 50 seconds)
2025-09-12 08:32:03,150 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:32:03,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:36:00,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3218.01514 ± 826.810
2025-09-12 08:36:00,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3729.0571, 3876.2993, 2197.8218, 1843.0918, 3635.0747, 3747.8684, 3776.3523, 3767.3706, 3751.962, 1855.253]
2025-09-12 08:36:00,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 570.0, 506.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 534.0]
2025-09-12 08:36:00,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 11 hours, 32 minutes, 20 seconds)
2025-09-12 08:48:17,402 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:48:17,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:52:18,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3402.17456 ± 1040.775
2025-09-12 08:52:18,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3898.445, 3825.932, 372.02832, 3908.666, 3649.0662, 3021.53, 3800.3914, 3854.0488, 3899.1912, 3792.4458]
2025-09-12 08:52:18,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 140.0, 1000.0, 936.0, 790.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:52:18,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 11 hours, 24 minutes, 12 seconds)
2025-09-12 09:02:52,809 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:02:52,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:06:36,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3144.46631 ± 1435.959
2025-09-12 09:06:36,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3846.6428, 3776.9868, 3832.1257, 3939.081, 3927.7703, 3990.4067, 3776.3545, 3798.796, 411.207, 145.29018]
2025-09-12 09:06:36,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 137.0, 96.0]
2025-09-12 09:06:36,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 10 hours, 59 minutes, 28 seconds)
2025-09-12 09:18:41,733 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:18:41,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:22:43,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3304.62646 ± 1057.738
2025-09-12 09:22:43,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2055.7039, 3819.9094, 3779.776, 3845.5823, 549.16846, 3703.0808, 3820.1907, 3795.4358, 3927.7996, 3749.621]
2025-09-12 09:22:43,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [565.0, 1000.0, 1000.0, 1000.0, 187.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:22:43,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 10 hours, 42 minutes, 55 seconds)
2025-09-12 09:34:00,328 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:34:00,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:38:09,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3539.14136 ± 904.293
2025-09-12 09:38:09,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [881.7376, 3940.9812, 3341.9788, 3857.721, 3824.0676, 3994.7463, 3937.6594, 3995.5742, 3728.3792, 3888.5688]
2025-09-12 09:38:09,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [262.0, 1000.0, 895.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:38:09,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 10 hours, 35 minutes, 40 seconds)
2025-09-12 09:49:15,229 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:49:15,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:53:27,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3469.28320 ± 845.588
2025-09-12 09:53:27,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3904.0337, 3839.9922, 3766.4138, 3688.9575, 969.4366, 3738.497, 3745.3398, 3358.122, 3824.2131, 3857.827]
2025-09-12 09:53:27,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 308.0, 977.0, 1000.0, 883.0, 1000.0, 1000.0]
2025-09-12 09:53:27,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 10 hours, 19 minutes, 32 seconds)
2025-09-12 10:05:15,027 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:05:15,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:09:27,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3597.16455 ± 752.638
2025-09-12 10:09:27,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3927.7214, 3906.5322, 3468.7249, 1373.7369, 3852.533, 3901.7507, 3886.2083, 3933.8372, 3915.8628, 3804.7354]
2025-09-12 10:09:27,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 389.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:09:27,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 10 hours, 1 minute, 45 seconds)
2025-09-12 10:20:41,174 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:20:41,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:24:09,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2944.25269 ± 1071.203
2025-09-12 10:24:09,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2906.7205, 2741.6875, 3941.9707, 975.9019, 3862.2888, 3729.523, 3889.316, 3842.516, 1195.4459, 2357.1558]
2025-09-12 10:24:09,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [741.0, 740.0, 1000.0, 276.0, 1000.0, 1000.0, 1000.0, 1000.0, 329.0, 691.0]
2025-09-12 10:24:09,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 9 hours, 49 minutes, 24 seconds)
2025-09-12 10:34:56,932 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:34:56,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:39:29,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3810.87305 ± 79.632
2025-09-12 10:39:29,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3816.9917, 3830.9695, 3807.5144, 3640.189, 3844.4465, 3708.8103, 3833.0042, 3821.3386, 3951.7615, 3853.7058]
2025-09-12 10:39:29,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:39:29,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (3810.87) for latency ExtremeClogL1U23
2025-09-12 10:39:29,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 9 hours, 28 minutes, 3 seconds)
2025-09-12 10:51:04,149 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:51:04,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:55:35,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3716.05151 ± 147.244
2025-09-12 10:55:35,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3680.3254, 3748.329, 3740.6543, 3828.3901, 3834.2476, 3847.2058, 3372.2607, 3617.5706, 3887.9553, 3603.5757]
2025-09-12 10:55:35,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 890.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:55:35,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 9 hours, 17 minutes, 33 seconds)
2025-09-12 11:07:21,350 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:07:21,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:11:53,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3859.79419 ± 48.169
2025-09-12 11:11:53,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3906.8267, 3963.8818, 3825.2168, 3847.9036, 3792.7085, 3822.7373, 3841.0144, 3905.5396, 3853.874, 3838.237]
2025-09-12 11:11:53,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:11:53,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (3859.79) for latency ExtremeClogL1U23
2025-09-12 11:11:53,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 9 hours, 9 minutes, 3 seconds)
2025-09-12 11:23:25,507 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:23:25,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:27:55,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3807.64453 ± 119.560
2025-09-12 11:27:55,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3876.869, 3815.161, 3911.3047, 3475.161, 3876.6194, 3884.1377, 3873.9082, 3799.0732, 3796.7373, 3767.471]
2025-09-12 11:27:55,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:27:55,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 8 hours, 53 minutes, 40 seconds)
2025-09-12 11:38:20,326 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:38:20,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:41:50,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2910.81250 ± 1350.975
2025-09-12 11:41:50,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3839.3557, 766.87646, 3785.3748, 3788.2078, 3759.4932, 1014.8499, 3833.2986, 3855.0188, 769.74286, 3695.9075]
2025-09-12 11:41:50,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 262.0, 1000.0, 1000.0, 1000.0, 302.0, 1000.0, 1000.0, 228.0, 1000.0]
2025-09-12 11:41:50,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 8 hours, 32 minutes, 40 seconds)
2025-09-12 11:53:22,750 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:53:22,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:57:05,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3187.94092 ± 1098.422
2025-09-12 11:57:05,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3854.0273, 3878.108, 2028.6183, 3976.9712, 949.99884, 3924.0916, 1683.6385, 3870.9216, 3806.3523, 3906.6838]
2025-09-12 11:57:05,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 522.0, 1000.0, 263.0, 1000.0, 476.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:57:05,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 8 hours, 16 minutes, 42 seconds)
2025-09-12 12:08:17,384 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:08:17,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:11:50,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2964.07642 ± 1403.562
2025-09-12 12:11:50,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1483.8488, 3855.4456, 3901.7466, 3938.2478, 3785.0923, 238.56793, 3824.3008, 3842.7183, 868.52545, 3902.27]
2025-09-12 12:11:50,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [424.0, 1000.0, 1000.0, 1000.0, 1000.0, 121.0, 1000.0, 1000.0, 279.0, 1000.0]
2025-09-12 12:11:50,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 7 hours, 52 minutes, 40 seconds)
2025-09-12 12:23:18,108 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:23:18,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:27:17,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3408.17432 ± 1076.701
2025-09-12 12:27:17,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3837.8364, 3817.3518, 2738.9556, 3914.751, 353.41504, 3865.4387, 4029.7842, 3978.7722, 3678.0173, 3867.4202]
2025-09-12 12:27:17,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 718.0, 1000.0, 133.0, 1000.0, 1000.0, 1000.0, 962.0, 1000.0]
2025-09-12 12:27:17,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 32 minutes, 22 seconds)
2025-09-12 12:39:03,170 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:39:03,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:42:52,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3286.18555 ± 1370.685
2025-09-12 12:42:52,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3977.1458, 3914.38, 548.0327, 3943.038, 3938.0684, 543.83, 3955.3228, 4019.237, 4058.003, 3964.7954]
2025-09-12 12:42:52,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 200.0, 1000.0, 1000.0, 183.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:42:52,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 14 minutes, 39 seconds)
2025-09-12 12:54:46,422 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:54:46,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:58:53,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3562.47339 ± 1067.191
2025-09-12 12:58:53,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3469.488, 3879.0322, 3975.925, 4015.6877, 3960.4917, 3937.268, 3991.0796, 4060.7, 3939.7058, 395.35568]
2025-09-12 12:58:53,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [875.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 136.0]
2025-09-12 12:58:53,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 7 hours, 11 minutes, 27 seconds)
2025-09-12 13:10:05,231 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:10:05,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:13:45,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3216.86182 ± 1447.205
2025-09-12 13:13:45,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [327.6382, 3938.3513, 3554.38, 4004.486, 4054.4111, 3968.727, 341.9278, 3960.0347, 3984.5144, 4034.1484]
2025-09-12 13:13:45,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [122.0, 1000.0, 911.0, 1000.0, 1000.0, 1000.0, 136.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:13:45,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 54 minutes, 1 second)
2025-09-12 13:24:38,775 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:24:38,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:28:29,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3291.27979 ± 1215.067
2025-09-12 13:28:29,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3844.0168, 929.9207, 4021.5059, 3931.6357, 3802.7375, 799.6365, 3888.3577, 3919.578, 3944.5898, 3830.8176]
2025-09-12 13:28:29,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 267.0, 1000.0, 1000.0, 1000.0, 242.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:28:29,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 38 minutes, 35 seconds)
2025-09-12 13:39:52,690 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:39:52,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:43:46,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3398.68945 ± 1201.578
2025-09-12 13:43:46,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3901.5444, 198.50299, 3968.8806, 4059.5593, 2113.0354, 3910.0215, 3888.806, 3966.2356, 4043.128, 3937.1804]
2025-09-12 13:43:46,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 96.0, 1000.0, 1000.0, 535.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:43:46,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 22 minutes, 24 seconds)
2025-09-12 13:55:38,426 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:55:38,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:59:36,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3525.68945 ± 1119.614
2025-09-12 13:59:36,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3953.464, 2947.0005, 4037.8005, 3831.8684, 3959.2021, 3993.5203, 4055.157, 4144.7417, 309.7288, 4024.412]
2025-09-12 13:59:36,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 760.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 998.0, 121.0, 1000.0]
2025-09-12 13:59:36,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 8 minutes, 20 seconds)
2025-09-12 14:10:27,703 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:10:27,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:14:54,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3889.60425 ± 203.242
2025-09-12 14:14:54,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3302.6726, 4015.441, 3935.8735, 3997.4858, 3981.5535, 3930.8853, 4022.199, 3934.0173, 3819.9373, 3955.9817]
2025-09-12 14:14:54,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [845.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:14:54,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (3889.60) for latency ExtremeClogL1U23
2025-09-12 14:14:54,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 49 minutes, 43 seconds)
2025-09-12 14:26:40,343 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:26:40,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:31:00,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3858.15381 ± 417.092
2025-09-12 14:31:00,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4011.3684, 4010.5984, 3882.8423, 4019.2551, 3799.039, 4098.851, 2632.1714, 4042.9016, 4043.1433, 4041.368]
2025-09-12 14:31:00,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 941.0, 1000.0, 659.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:31:00,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 39 minutes, 53 seconds)
2025-09-12 14:42:23,393 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:42:23,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:46:24,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3604.08740 ± 698.334
2025-09-12 14:46:24,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3090.5286, 3868.934, 3336.84, 4026.5874, 4041.912, 3975.93, 3975.6824, 1733.8854, 4098.0625, 3892.5107]
2025-09-12 14:46:24,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [772.0, 1000.0, 835.0, 1000.0, 1000.0, 1000.0, 1000.0, 447.0, 1000.0, 1000.0]
2025-09-12 14:46:24,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 27 minutes, 15 seconds)
2025-09-12 14:56:57,672 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:56:57,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:00:52,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3439.22583 ± 858.872
2025-09-12 15:00:52,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3998.8914, 3948.2515, 3934.561, 3966.914, 3957.7605, 2939.0532, 2472.7139, 3823.1655, 4000.8455, 1350.1006]
2025-09-12 15:00:52,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 767.0, 638.0, 1000.0, 1000.0, 374.0]
2025-09-12 15:00:52,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 8 minutes, 23 seconds)
2025-09-12 15:13:15,084 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:13:15,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:17:04,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3327.42456 ± 1153.655
2025-09-12 15:17:04,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1967.2894, 4030.1675, 3821.4456, 4019.0376, 2983.0063, 3865.4954, 3976.3218, 450.169, 4095.8428, 4065.4695]
2025-09-12 15:17:04,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [527.0, 1000.0, 1000.0, 1000.0, 757.0, 1000.0, 1000.0, 140.0, 1000.0, 1000.0]
2025-09-12 15:17:04,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 54 minutes, 23 seconds)
2025-09-12 15:27:45,212 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:27:45,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:31:17,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3061.98682 ± 1237.439
2025-09-12 15:31:17,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3061.369, 3979.584, 3927.2893, 802.9644, 1028.3425, 3920.5767, 3930.8335, 3964.0664, 4044.06, 1960.782]
2025-09-12 15:31:17,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [794.0, 1000.0, 1000.0, 229.0, 325.0, 1000.0, 1000.0, 1000.0, 1000.0, 521.0]
2025-09-12 15:31:17,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 34 minutes, 59 seconds)
2025-09-12 15:43:05,868 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:43:05,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:46:55,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3305.37231 ± 1339.334
2025-09-12 15:46:55,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3807.1562, 1086.1816, 4027.9314, 4019.2444, 233.07463, 4037.3826, 3947.3328, 3835.6414, 3952.8594, 4106.9185]
2025-09-12 15:46:55,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 298.0, 1000.0, 1000.0, 86.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:46:55,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 18 minutes, 7 seconds)
2025-09-12 15:58:02,220 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:58:02,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:01:37,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3120.49048 ± 1221.714
2025-09-12 16:01:37,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3979.6309, 1676.2484, 1914.0344, 3049.6519, 4112.4644, 4003.8408, 3959.7383, 537.086, 3993.8508, 3978.357]
2025-09-12 16:01:37,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 447.0, 512.0, 822.0, 1000.0, 1000.0, 1000.0, 167.0, 1000.0, 1000.0]
2025-09-12 16:01:37,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 42 seconds)
2025-09-12 16:13:17,672 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:13:17,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:17:29,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3631.33472 ± 1069.641
2025-09-12 16:17:29,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4081.3535, 4043.079, 4054.4646, 3910.637, 3962.3284, 3955.9233, 4027.1648, 3993.233, 3856.7678, 428.3937]
2025-09-12 16:17:29,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 145.0]
2025-09-12 16:17:29,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 49 minutes, 52 seconds)
2025-09-12 16:27:51,079 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:27:51,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:31:37,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3318.32861 ± 1119.861
2025-09-12 16:31:37,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4019.3354, 4087.4353, 2564.2974, 3947.408, 3984.2944, 4028.2422, 838.6052, 1738.4531, 3936.304, 4038.9111]
2025-09-12 16:31:37,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 686.0, 1000.0, 1000.0, 1000.0, 259.0, 477.0, 1000.0, 1000.0]
2025-09-12 16:31:37,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 28 minutes, 41 seconds)
2025-09-12 16:42:58,343 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:42:58,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:47:05,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3554.39185 ± 827.539
2025-09-12 16:47:05,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4020.5703, 1626.146, 3906.0664, 4076.747, 4042.7075, 2228.7834, 4003.2617, 3856.6208, 3905.8428, 3877.173]
2025-09-12 16:47:05,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 438.0, 1000.0, 998.0, 1000.0, 591.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:47:05,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 17 minutes, 3 seconds)
2025-09-12 16:59:01,569 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:59:01,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:03:01,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3621.28125 ± 1087.490
2025-09-12 17:03:01,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [433.9972, 4109.513, 4127.672, 4067.675, 4069.7559, 3884.2173, 4155.973, 3971.4702, 4072.7427, 3319.796]
2025-09-12 17:03:01,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [139.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 826.0]
2025-09-12 17:03:01,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 2 minutes, 37 seconds)
2025-09-12 17:14:26,290 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:14:26,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:18:36,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3599.67578 ± 967.307
2025-09-12 17:18:36,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4002.0652, 3888.2573, 3708.8262, 3861.1926, 3969.5027, 709.82086, 3920.198, 4052.5466, 3921.4006, 3962.9492]
2025-09-12 17:18:36,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 979.0, 209.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:18:36,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 49 minutes, 22 seconds)
2025-09-12 17:29:22,232 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:29:22,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:33:13,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3259.76318 ± 1102.789
2025-09-12 17:33:13,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4004.6414, 1157.4045, 3189.1987, 3807.871, 3697.224, 3968.652, 3922.4219, 3908.429, 3899.6838, 1042.1061]
2025-09-12 17:33:13,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 336.0, 803.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 310.0]
2025-09-12 17:33:13,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 31 minutes, 28 seconds)
2025-09-12 17:44:49,896 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:44:49,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:49:24,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4030.01440 ± 82.463
2025-09-12 17:49:24,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4077.5105, 4085.6233, 4142.839, 3932.4756, 4011.667, 3937.501, 3887.3674, 4131.3726, 4059.609, 4034.1807]
2025-09-12 17:49:24,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:49:24,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (4030.01) for latency ExtremeClogL1U23
2025-09-12 17:49:24,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 20 minutes, 1 second)
2025-09-12 18:00:32,662 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:00:32,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:04:21,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3362.43994 ± 1334.216
2025-09-12 18:04:21,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [239.48126, 3957.3538, 4039.2366, 1229.63, 4099.372, 4109.0806, 3967.4465, 3938.5476, 3925.3147, 4118.937]
2025-09-12 18:04:21,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [82.0, 1000.0, 1000.0, 381.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:04:21,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 3 minutes, 38 seconds)
2025-09-12 18:16:07,272 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:16:07,276 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:20:36,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 4003.62256 ± 70.974
2025-09-12 18:20:36,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4095.7288, 3967.948, 4044.0713, 3973.5322, 3994.8555, 4043.444, 4046.1414, 4032.235, 3820.3667, 4017.9016]
2025-09-12 18:20:36,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:20:36,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 48 minutes, 37 seconds)
2025-09-12 18:31:34,886 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:31:34,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:35:41,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3685.81396 ± 1121.650
2025-09-12 18:35:41,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4093.9602, 4043.2173, 4104.7754, 4029.4587, 4070.4155, 4014.4907, 4055.4482, 4128.1304, 322.924, 3995.3196]
2025-09-12 18:35:41,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 126.0, 1000.0]
2025-09-12 18:35:41,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 32 minutes, 29 seconds)
2025-09-12 18:47:16,461 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:47:16,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:51:46,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3976.86328 ± 97.713
2025-09-12 18:51:46,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3951.3462, 4072.975, 4060.1562, 3905.52, 4049.3828, 3916.5012, 3789.189, 4043.474, 4101.07, 3879.0176]
2025-09-12 18:51:46,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:51:46,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 18 minutes, 32 seconds)
2025-09-12 19:03:37,941 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:03:37,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:08:07,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3985.32300 ± 82.422
2025-09-12 19:08:07,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3853.681, 4076.4753, 3958.7263, 4032.2031, 4000.394, 4040.2146, 4081.8308, 3826.0134, 3965.663, 4018.027]
2025-09-12 19:08:07,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:08:07,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 2 minutes, 58 seconds)
2025-09-12 19:18:51,038 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:18:51,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:22:30,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3191.17700 ± 1335.719
2025-09-12 19:22:30,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4018.103, 1082.6481, 4136.1235, 3814.8442, 4126.338, 4089.2437, 4027.0986, 4050.0256, 2003.1445, 564.1987]
2025-09-12 19:22:30,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 337.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 488.0, 220.0]
2025-09-12 19:22:30,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 46 minutes, 53 seconds)
2025-09-12 19:33:25,075 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:33:25,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:37:01,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3145.84326 ± 1122.800
2025-09-12 19:37:01,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3968.7373, 4056.5222, 3761.8752, 4196.172, 2718.6072, 1270.7249, 2193.725, 4074.659, 3945.26, 1272.148]
2025-09-12 19:37:01,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 923.0, 1000.0, 677.0, 360.0, 578.0, 1000.0, 1000.0, 340.0]
2025-09-12 19:37:01,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 30 minutes, 34 seconds)
2025-09-12 19:48:22,789 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:48:22,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:51:45,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2966.16943 ± 1370.790
2025-09-12 19:51:45,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4030.057, 3910.557, 4138.814, 2038.8257, 368.55566, 3063.4758, 636.1873, 3924.4727, 4018.2817, 3532.4688]
2025-09-12 19:51:45,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 531.0, 139.0, 746.0, 195.0, 1000.0, 1000.0, 862.0]
2025-09-12 19:51:45,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 15 minutes, 12 seconds)
2025-09-12 20:03:50,862 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:03:50,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:08:10,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3850.04761 ± 593.433
2025-09-12 20:08:10,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3969.9292, 3911.7295, 2095.4397, 4008.1716, 3973.572, 4169.9106, 4261.2373, 4062.7588, 3962.687, 4085.0427]
2025-09-12 20:08:10,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 548.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:08:10,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1251 [DEBUG]: Training session finished
