2025-05-03 04:11:25,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-mbpac_memdelay
2025-05-03 04:11:25,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-mbpac_memdelay
2025-05-03 04:11:25,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1008 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x7fa676e6fb20>}
2025-05-03 04:11:25,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1009 [DEBUG]: using device: cuda
2025-05-03 04:11:25,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1031 [INFO]: Creating new trainer
2025-05-03 04:11:25,943 baseline-mbpac-noisy-walker2d:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-03 04:11:25,943 baseline-mbpac-noisy-walker2d:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-03 04:11:25,953 baseline-mbpac-noisy-walker2d:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-05-03 04:11:26,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1092 [DEBUG]: Starting training session...
2025-05-03 04:11:26,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 1/100
2025-05-03 04:21:50,931 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 04:21:50,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 04:22:47,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 137.65280 ± 118.122
2025-05-03 04:22:47,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [471.07285, 213.3915, 98.61263, 72.53803, 108.397125, 106.00683, 70.608765, 84.54531, 74.43687, 76.91796]
2025-05-03 04:22:47,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [354.0, 328.0, 212.0, 184.0, 222.0, 219.0, 181.0, 194.0, 185.0, 188.0]
2025-05-03 04:22:47,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (137.65) for latency ExtremeClogL1U23
2025-05-03 04:22:47,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-03 04:22:47,422 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-03 04:22:47,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 2/100 (estimated time remaining: 18 hours, 43 minutes, 2 seconds)
2025-05-03 04:34:42,133 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 04:34:42,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 04:35:33,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 119.36809 ± 118.711
2025-05-03 04:35:33,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [354.97607, 28.576292, 96.09537, 155.75601, -53.644444, 20.149944, 219.70459, -3.7657118, 209.11514, 166.7176]
2025-05-03 04:35:33,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [341.0, 202.0, 196.0, 301.0, 216.0, 186.0, 156.0, 124.0, 204.0, 152.0]
2025-05-03 04:35:33,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 3/100 (estimated time remaining: 19 hours, 41 minutes, 22 seconds)
2025-05-03 04:45:23,288 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 04:45:23,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 04:46:38,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 368.22427 ± 239.860
2025-05-03 04:46:38,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [200.01991, 220.01917, 320.2739, 157.79442, 1040.2415, 408.6449, 228.3234, 380.8197, 310.7708, 415.33524]
2025-05-03 04:46:38,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [268.0, 117.0, 178.0, 240.0, 1000.0, 279.0, 141.0, 246.0, 193.0, 301.0]
2025-05-03 04:46:38,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (368.22) for latency ExtremeClogL1U23
2025-05-03 04:46:38,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-03 04:46:38,157 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-03 04:46:38,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 4/100 (estimated time remaining: 18 hours, 57 minutes, 47 seconds)
2025-05-03 04:56:26,497 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 04:56:26,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 04:57:18,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 292.26978 ± 139.025
2025-05-03 04:57:18,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [267.81412, 363.01678, 297.66394, 194.89851, 262.30893, 681.0762, 191.22728, 198.85565, 225.29138, 240.54501]
2025-05-03 04:57:18,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [215.0, 224.0, 218.0, 121.0, 161.0, 632.0, 121.0, 111.0, 131.0, 134.0]
2025-05-03 04:57:18,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 5/100 (estimated time remaining: 18 hours, 20 minutes, 30 seconds)
2025-05-03 05:07:05,854 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 05:07:05,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 05:08:23,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 362.42877 ± 190.843
2025-05-03 05:08:23,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [394.3716, 260.66583, 260.43933, 268.09738, 123.52354, 298.20755, 503.6187, 861.26416, 350.06082, 304.03903]
2025-05-03 05:08:23,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [271.0, 192.0, 139.0, 180.0, 212.0, 408.0, 526.0, 806.0, 217.0, 174.0]
2025-05-03 05:08:23,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 6/100 (estimated time remaining: 18 hours, 1 minute, 59 seconds)
2025-05-03 05:18:25,392 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 05:18:25,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 05:19:04,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 263.35287 ± 111.736
2025-05-03 05:19:04,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [245.62163, 56.305576, 263.6074, 255.9274, 292.62198, 239.99147, 325.39606, 265.9407, 166.11688, 521.9999]
2025-05-03 05:19:04,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [131.0, 76.0, 128.0, 135.0, 159.0, 137.0, 161.0, 135.0, 185.0, 338.0]
2025-05-03 05:19:04,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 7/100 (estimated time remaining: 17 hours, 38 minutes, 14 seconds)
2025-05-03 05:28:43,778 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 05:28:43,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 05:29:39,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 351.41138 ± 70.105
2025-05-03 05:29:39,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [298.21176, 480.06357, 352.83267, 295.30148, 339.19537, 272.37424, 270.99472, 341.31198, 450.47522, 413.353]
2025-05-03 05:29:39,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [167.0, 319.0, 174.0, 370.0, 147.0, 151.0, 153.0, 185.0, 300.0, 235.0]
2025-05-03 05:29:39,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 8/100 (estimated time remaining: 16 hours, 46 minutes, 13 seconds)
2025-05-03 05:39:24,489 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 05:39:24,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 05:40:17,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 424.79810 ± 58.722
2025-05-03 05:40:17,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [403.74017, 428.76, 470.21744, 572.3411, 385.19608, 385.53354, 407.36166, 450.07535, 354.44324, 390.3119]
2025-05-03 05:40:17,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [197.0, 210.0, 301.0, 240.0, 204.0, 218.0, 209.0, 214.0, 164.0, 199.0]
2025-05-03 05:40:17,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (424.80) for latency ExtremeClogL1U23
2025-05-03 05:40:17,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-03 05:40:17,988 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-03 05:40:18,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 9/100 (estimated time remaining: 16 hours, 27 minutes, 24 seconds)
2025-05-03 05:50:09,062 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 05:50:09,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 05:51:00,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 367.27319 ± 33.358
2025-05-03 05:51:00,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [422.81387, 331.80322, 367.88428, 394.31598, 350.13123, 405.20987, 317.27835, 342.727, 345.7937, 394.77438]
2025-05-03 05:51:00,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [230.0, 176.0, 199.0, 230.0, 168.0, 278.0, 165.0, 229.0, 184.0, 213.0]
2025-05-03 05:51:00,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 10/100 (estimated time remaining: 16 hours, 17 minutes, 21 seconds)
2025-05-03 06:00:47,618 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 06:00:47,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 06:01:35,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 379.68481 ± 56.727
2025-05-03 06:01:35,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [304.45898, 352.03607, 336.28607, 447.88348, 349.92447, 391.29523, 308.62213, 403.47668, 417.5936, 485.2716]
2025-05-03 06:01:35,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [139.0, 172.0, 167.0, 203.0, 167.0, 192.0, 162.0, 257.0, 202.0, 250.0]
2025-05-03 06:01:35,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 11/100 (estimated time remaining: 15 hours, 57 minutes, 29 seconds)
2025-05-03 06:11:20,114 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 06:11:20,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 06:12:10,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 403.48560 ± 169.455
2025-05-03 06:12:10,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [660.87616, 467.7946, 580.088, 419.2502, 3.0416563, 434.8283, 453.89902, 409.21582, 278.52225, 327.33978]
2025-05-03 06:12:10,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [306.0, 243.0, 265.0, 202.0, 12.0, 234.0, 230.0, 179.0, 153.0, 161.0]
2025-05-03 06:12:10,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 12/100 (estimated time remaining: 15 hours, 45 minutes, 2 seconds)
2025-05-03 06:22:03,336 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 06:22:03,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 06:23:14,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 497.92242 ± 143.179
2025-05-03 06:23:14,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [376.38257, 360.4456, 331.5186, 815.2543, 507.76474, 508.66208, 414.46506, 425.97754, 596.07166, 642.6821]
2025-05-03 06:23:14,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [195.0, 210.0, 224.0, 578.0, 271.0, 258.0, 277.0, 265.0, 303.0, 309.0]
2025-05-03 06:23:14,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (497.92) for latency ExtremeClogL1U23
2025-05-03 06:23:14,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-03 06:23:15,003 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-03 06:23:15,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 13/100 (estimated time remaining: 15 hours, 43 minutes, 17 seconds)
2025-05-03 06:33:08,286 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 06:33:08,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 06:34:04,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 431.06781 ± 90.177
2025-05-03 06:34:04,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [439.64603, 224.09106, 351.6972, 479.30878, 425.72665, 453.58478, 480.67023, 375.94556, 542.2456, 537.76227]
2025-05-03 06:34:04,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [225.0, 129.0, 192.0, 227.0, 232.0, 208.0, 220.0, 199.0, 325.0, 237.0]
2025-05-03 06:34:04,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 14/100 (estimated time remaining: 15 hours, 35 minutes, 44 seconds)
2025-05-03 06:43:46,160 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 06:43:46,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 06:44:39,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 428.20428 ± 139.390
2025-05-03 06:44:39,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [437.88797, 303.25626, 474.71222, 377.09805, 203.28595, 439.1369, 723.09845, 379.67264, 594.34686, 349.5478]
2025-05-03 06:44:39,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [211.0, 174.0, 261.0, 159.0, 116.0, 221.0, 300.0, 188.0, 288.0, 177.0]
2025-05-03 06:44:39,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 15/100 (estimated time remaining: 15 hours, 22 minutes, 48 seconds)
2025-05-03 06:54:47,846 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 06:54:47,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 06:55:44,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 472.69199 ± 163.515
2025-05-03 06:55:44,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [450.00842, 727.86414, 305.574, 690.0588, 504.55893, 531.87305, 381.4737, 586.00073, 168.6133, 380.8943]
2025-05-03 06:55:44,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [207.0, 304.0, 167.0, 374.0, 245.0, 226.0, 189.0, 281.0, 102.0, 172.0]
2025-05-03 06:55:44,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 16/100 (estimated time remaining: 15 hours, 20 minutes, 30 seconds)
2025-05-03 07:05:35,127 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 07:05:35,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 07:06:26,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 430.10028 ± 136.521
2025-05-03 07:06:26,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [738.62115, 303.7323, 376.41162, 388.35428, 619.73303, 338.103, 487.02673, 336.941, 315.4903, 396.58926]
2025-05-03 07:06:26,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [307.0, 181.0, 191.0, 182.0, 271.0, 143.0, 206.0, 187.0, 178.0, 199.0]
2025-05-03 07:06:26,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 17/100 (estimated time remaining: 15 hours, 11 minutes, 43 seconds)
2025-05-03 07:16:18,491 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 07:16:18,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 07:17:13,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 453.14438 ± 112.563
2025-05-03 07:17:13,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [435.91385, 544.7562, 485.30304, 659.83136, 376.86096, 451.68832, 465.46286, 512.04065, 205.48909, 394.097]
2025-05-03 07:17:13,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [235.0, 257.0, 237.0, 293.0, 182.0, 221.0, 223.0, 225.0, 132.0, 194.0]
2025-05-03 07:17:13,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 18/100 (estimated time remaining: 14 hours, 55 minutes, 52 seconds)
2025-05-03 07:26:52,570 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 07:26:52,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 07:27:47,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 450.23895 ± 191.963
2025-05-03 07:27:47,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [435.2211, 471.65088, 816.5, 267.58392, 712.3456, 328.23376, 222.76143, 493.3632, 215.35988, 539.3699]
2025-05-03 07:27:47,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [210.0, 193.0, 379.0, 136.0, 338.0, 164.0, 121.0, 235.0, 167.0, 262.0]
2025-05-03 07:27:47,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 19/100 (estimated time remaining: 14 hours, 40 minutes, 57 seconds)
2025-05-03 07:37:49,984 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 07:37:49,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 07:38:38,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 444.84155 ± 163.419
2025-05-03 07:38:38,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [357.24692, 257.54102, 262.70248, 246.06314, 655.75525, 684.0591, 639.6592, 414.70026, 536.76556, 393.92258]
2025-05-03 07:38:38,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [161.0, 144.0, 128.0, 141.0, 260.0, 268.0, 262.0, 177.0, 231.0, 172.0]
2025-05-03 07:38:38,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 20/100 (estimated time remaining: 14 hours, 34 minutes, 36 seconds)
2025-05-03 07:48:16,368 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 07:48:16,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 07:49:14,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 552.72742 ± 156.590
2025-05-03 07:49:14,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [756.02496, 541.4067, 309.82922, 700.2766, 747.8492, 340.69327, 536.1187, 652.87286, 561.42267, 380.78003]
2025-05-03 07:49:14,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [259.0, 208.0, 174.0, 248.0, 293.0, 218.0, 246.0, 268.0, 219.0, 187.0]
2025-05-03 07:49:14,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (552.73) for latency ExtremeClogL1U23
2025-05-03 07:49:14,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-03 07:49:14,299 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-03 07:49:14,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 21/100 (estimated time remaining: 14 hours, 16 minutes, 4 seconds)
2025-05-03 07:59:08,402 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 07:59:08,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 08:00:10,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 625.47253 ± 99.295
2025-05-03 08:00:10,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [594.1958, 621.0088, 426.27332, 707.72107, 632.32947, 693.5164, 666.0884, 517.4911, 804.4223, 591.6788]
2025-05-03 08:00:10,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [224.0, 252.0, 201.0, 311.0, 240.0, 273.0, 246.0, 221.0, 341.0, 223.0]
2025-05-03 08:00:10,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (625.47) for latency ExtremeClogL1U23
2025-05-03 08:00:10,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-03 08:00:10,968 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-03 08:00:10,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 22/100 (estimated time remaining: 14 hours, 9 minutes, 7 seconds)
2025-05-03 08:10:01,593 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 08:10:01,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 08:10:57,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 556.09265 ± 311.100
2025-05-03 08:10:57,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [646.5149, 224.27296, 692.57733, 360.94473, 770.10754, 197.20772, 259.92004, 487.48355, 646.30786, 1275.5896]
2025-05-03 08:10:57,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [317.0, 115.0, 229.0, 166.0, 292.0, 132.0, 143.0, 192.0, 233.0, 432.0]
2025-05-03 08:10:57,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 23/100 (estimated time remaining: 13 hours, 58 minutes, 16 seconds)
2025-05-03 08:20:51,316 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 08:20:51,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 08:21:56,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 725.84399 ± 248.396
2025-05-03 08:21:56,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [454.84363, 870.2573, 1110.4095, 839.4132, 766.7572, 469.59647, 813.2966, 231.87717, 927.17084, 774.81805]
2025-05-03 08:21:56,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [174.0, 311.0, 363.0, 310.0, 266.0, 202.0, 279.0, 132.0, 332.0, 266.0]
2025-05-03 08:21:56,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (725.84) for latency ExtremeClogL1U23
2025-05-03 08:21:56,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-03 08:21:56,403 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-03 08:21:56,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 24/100 (estimated time remaining: 13 hours, 53 minutes, 49 seconds)
2025-05-03 08:31:49,882 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 08:31:49,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 08:32:57,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 764.25000 ± 214.204
2025-05-03 08:32:57,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [592.6051, 554.8222, 913.61597, 738.94305, 768.0194, 1297.4868, 678.26953, 581.5058, 621.95386, 895.2782]
2025-05-03 08:32:57,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [219.0, 209.0, 301.0, 280.0, 269.0, 413.0, 261.0, 220.0, 223.0, 299.0]
2025-05-03 08:32:57,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (764.25) for latency ExtremeClogL1U23
2025-05-03 08:32:57,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-03 08:32:57,731 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-03 08:32:57,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 25/100 (estimated time remaining: 13 hours, 45 minutes, 39 seconds)
2025-05-03 08:42:34,485 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 08:42:34,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 08:43:45,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 763.31561 ± 416.446
2025-05-03 08:43:45,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [621.44135, 62.114838, 829.0167, 51.113018, 1385.16, 1126.8574, 667.73254, 761.26434, 1034.0309, 1094.4253]
2025-05-03 08:43:45,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [271.0, 70.0, 302.0, 55.0, 516.0, 419.0, 252.0, 273.0, 348.0, 357.0]
2025-05-03 08:43:45,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 26/100 (estimated time remaining: 13 hours, 37 minutes, 46 seconds)
2025-05-03 08:53:44,283 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 08:53:44,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 08:54:40,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 631.99304 ± 302.124
2025-05-03 08:54:40,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [450.95834, 517.9137, 95.045204, 980.09247, 616.3043, 728.5579, 608.69257, 1242.2035, 710.5939, 369.569]
2025-05-03 08:54:40,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [169.0, 181.0, 112.0, 325.0, 213.0, 242.0, 209.0, 374.0, 224.0, 145.0]
2025-05-03 08:54:40,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 27/100 (estimated time remaining: 13 hours, 26 minutes, 21 seconds)
2025-05-03 09:04:27,571 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 09:04:27,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 09:05:52,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1034.84290 ± 345.286
2025-05-03 09:05:52,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [716.2403, 815.30804, 1381.4557, 762.1223, 1090.2858, 906.10205, 1156.6133, 1883.2207, 808.3165, 828.76495]
2025-05-03 09:05:52,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [239.0, 279.0, 418.0, 252.0, 368.0, 301.0, 400.0, 615.0, 291.0, 277.0]
2025-05-03 09:05:52,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (1034.84) for latency ExtremeClogL1U23
2025-05-03 09:05:52,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-03 09:05:52,894 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-03 09:05:52,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 28/100 (estimated time remaining: 13 hours, 21 minutes, 56 seconds)
2025-05-03 09:16:21,706 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 09:16:21,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 09:17:45,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 969.39685 ± 257.410
2025-05-03 09:17:45,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1507.3855, 903.93585, 955.8034, 1092.9889, 795.925, 1277.6902, 687.0043, 885.5766, 993.6642, 593.9953]
2025-05-03 09:17:45,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [545.0, 321.0, 313.0, 360.0, 267.0, 418.0, 243.0, 295.0, 347.0, 206.0]
2025-05-03 09:17:45,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 29/100 (estimated time remaining: 13 hours, 23 minutes, 47 seconds)
2025-05-03 09:27:32,103 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 09:27:32,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 09:29:12,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1027.91431 ± 411.305
2025-05-03 09:29:12,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [639.17944, 1077.8567, 1046.3461, 1532.6531, 775.76935, 758.63605, 271.51895, 1193.2076, 1250.3928, 1733.5825]
2025-05-03 09:29:12,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [292.0, 397.0, 352.0, 517.0, 310.0, 309.0, 159.0, 382.0, 377.0, 545.0]
2025-05-03 09:29:12,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 30/100 (estimated time remaining: 13 hours, 18 minutes, 35 seconds)
2025-05-03 09:39:41,833 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 09:39:41,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 09:41:31,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1254.10278 ± 808.959
2025-05-03 09:41:31,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1185.9427, 2500.709, 3032.028, 560.2487, 891.5585, 1307.7689, 552.8031, 498.71164, 1112.3556, 898.9017]
2025-05-03 09:41:31,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [345.0, 758.0, 1000.0, 213.0, 301.0, 425.0, 209.0, 186.0, 425.0, 362.0]
2025-05-03 09:41:31,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (1254.10) for latency ExtremeClogL1U23
2025-05-03 09:41:31,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-03 09:41:31,335 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-03 09:41:31,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 31/100 (estimated time remaining: 13 hours, 28 minutes, 42 seconds)
2025-05-03 09:51:38,965 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 09:51:38,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 09:54:18,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1917.31775 ± 718.548
2025-05-03 09:54:18,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [2833.4612, 2034.3523, 1713.6741, 816.9236, 2767.8528, 2163.569, 1753.1239, 1974.0676, 561.72925, 2554.424]
2025-05-03 09:54:18,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 638.0, 532.0, 294.0, 858.0, 726.0, 453.0, 625.0, 250.0, 859.0]
2025-05-03 09:54:18,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (1917.32) for latency ExtremeClogL1U23
2025-05-03 09:54:18,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-03 09:54:18,110 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-03 09:54:18,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 32/100 (estimated time remaining: 13 hours, 42 minutes, 57 seconds)
2025-05-03 10:05:00,491 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 10:05:00,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 10:07:10,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1600.66846 ± 836.262
2025-05-03 10:07:10,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1165.8973, 2410.927, 1080.078, 3183.0146, 2791.7815, 833.1909, 1309.229, 1546.6182, 588.8785, 1097.0692]
2025-05-03 10:07:10,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [397.0, 755.0, 350.0, 1000.0, 770.0, 300.0, 429.0, 523.0, 224.0, 383.0]
2025-05-03 10:07:10,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 33/100 (estimated time remaining: 13 hours, 53 minutes, 30 seconds)
2025-05-03 10:16:52,340 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 10:16:52,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 10:20:14,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2448.16138 ± 938.426
2025-05-03 10:20:14,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1576.4329, 3050.7014, 2873.3962, 3431.2349, 2945.729, 898.343, 3155.786, 3245.4282, 2513.5283, 791.035]
2025-05-03 10:20:14,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [523.0, 1000.0, 1000.0, 1000.0, 893.0, 308.0, 1000.0, 1000.0, 761.0, 251.0]
2025-05-03 10:20:14,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (2448.16) for latency ExtremeClogL1U23
2025-05-03 10:20:14,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-03 10:20:14,791 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-03 10:20:14,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 34/100 (estimated time remaining: 13 hours, 57 minutes, 19 seconds)
2025-05-03 10:31:15,984 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 10:31:15,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 10:34:41,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2460.11523 ± 869.395
2025-05-03 10:34:41,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1744.0826, 1057.5785, 3006.6335, 2973.738, 3206.6838, 3105.2053, 3288.0847, 1019.38983, 3209.129, 1990.6272]
2025-05-03 10:34:41,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [563.0, 305.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 366.0, 1000.0, 622.0]
2025-05-03 10:34:41,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (2460.12) for latency ExtremeClogL1U23
2025-05-03 10:34:41,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-03 10:34:41,109 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-03 10:34:41,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 35/100 (estimated time remaining: 14 hours, 24 minutes, 23 seconds)
2025-05-03 10:44:39,428 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 10:44:39,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 10:46:59,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1668.50745 ± 1282.351
2025-05-03 10:46:59,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3111.1448, 3198.8882, 22.18038, 2888.637, 569.5073, 718.5063, 445.706, 309.42688, 3024.8345, 2396.2417]
2025-05-03 10:46:59,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 38.0, 1000.0, 215.0, 273.0, 225.0, 125.0, 916.0, 711.0]
2025-05-03 10:46:59,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 36/100 (estimated time remaining: 14 hours, 11 minutes, 11 seconds)
2025-05-03 10:58:10,114 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 10:58:10,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 11:01:40,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2613.41870 ± 727.311
2025-05-03 11:01:40,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [2985.671, 1559.1094, 3086.5967, 2553.453, 1833.3175, 3227.8562, 1282.4369, 3227.751, 3103.5686, 3274.4268]
2025-05-03 11:01:40,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 512.0, 918.0, 737.0, 569.0, 1000.0, 377.0, 1000.0, 1000.0, 1000.0]
2025-05-03 11:01:40,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (2613.42) for latency ExtremeClogL1U23
2025-05-03 11:01:40,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-03 11:01:40,492 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-03 11:01:40,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 37/100 (estimated time remaining: 14 hours, 22 minutes, 22 seconds)
2025-05-03 11:11:55,676 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 11:11:55,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 11:15:28,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2720.34473 ± 961.483
2025-05-03 11:15:28,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3428.9236, 3375.5962, 1378.3656, 3035.7207, 3245.8823, 3264.0857, 3259.396, 3387.4153, 494.1449, 2333.9175]
2025-05-03 11:15:28,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 429.0, 884.0, 1000.0, 1000.0, 994.0, 1000.0, 186.0, 710.0]
2025-05-03 11:15:28,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (2720.34) for latency ExtremeClogL1U23
2025-05-03 11:15:28,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-03 11:15:28,746 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-03 11:15:28,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 38/100 (estimated time remaining: 14 hours, 20 minutes, 42 seconds)
2025-05-03 11:26:00,332 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 11:26:00,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 11:29:21,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2504.09473 ± 1062.920
2025-05-03 11:29:21,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3399.3499, 3449.0493, 3321.56, 3153.9746, 2018.269, 1065.8167, 3322.1172, 3309.0405, 569.64777, 1432.1233]
2025-05-03 11:29:21,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 575.0, 354.0, 1000.0, 1000.0, 211.0, 439.0]
2025-05-03 11:29:21,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 39/100 (estimated time remaining: 14 hours, 16 minutes, 55 seconds)
2025-05-03 11:39:42,844 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 11:39:42,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 11:42:22,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2069.38965 ± 840.709
2025-05-03 11:42:22,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1180.3561, 2573.1816, 2881.3623, 3205.1401, 3394.6191, 1921.43, 1244.7146, 921.111, 1729.3553, 1642.6267]
2025-05-03 11:42:22,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [354.0, 714.0, 811.0, 907.0, 1000.0, 549.0, 410.0, 312.0, 508.0, 502.0]
2025-05-03 11:42:22,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 40/100 (estimated time remaining: 13 hours, 45 minutes, 50 seconds)
2025-05-03 11:53:43,156 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 11:53:43,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 11:57:02,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2538.37256 ± 1181.473
2025-05-03 11:57:02,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3385.8638, 1238.2397, 3451.2607, 3355.0525, 3299.181, 3410.5361, 308.90402, 3553.3936, 2488.2393, 893.05566]
2025-05-03 11:57:02,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 371.0, 1000.0, 1000.0, 1000.0, 1000.0, 131.0, 1000.0, 703.0, 279.0]
2025-05-03 11:57:02,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 41/100 (estimated time remaining: 14 hours, 27 seconds)
2025-05-03 12:07:24,072 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 12:07:24,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 12:10:55,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2732.95776 ± 832.588
2025-05-03 12:10:55,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3511.7563, 2796.969, 3375.4958, 3293.365, 1294.1143, 2693.531, 1164.9458, 3514.8179, 3305.3154, 2379.2659]
2025-05-03 12:10:55,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 792.0, 1000.0, 1000.0, 407.0, 804.0, 333.0, 1000.0, 1000.0, 663.0]
2025-05-03 12:10:55,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (2732.96) for latency ExtremeClogL1U23
2025-05-03 12:10:55,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-03 12:10:55,861 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-03 12:10:55,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 42/100 (estimated time remaining: 13 hours, 37 minutes, 13 seconds)
2025-05-03 12:21:20,454 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 12:21:20,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 12:25:52,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2856.89014 ± 933.209
2025-05-03 12:25:52,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3606.0168, 2436.7112, 3411.1187, 640.80945, 3111.5198, 3363.4902, 1672.5388, 3432.2676, 3428.8674, 3465.5632]
2025-05-03 12:25:52,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 709.0, 1000.0, 236.0, 900.0, 1000.0, 496.0, 1000.0, 1000.0, 1000.0]
2025-05-03 12:25:52,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (2856.89) for latency ExtremeClogL1U23
2025-05-03 12:25:52,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-03 12:25:52,717 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-03 12:25:52,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 43/100 (estimated time remaining: 13 hours, 36 minutes, 38 seconds)
2025-05-03 12:40:24,266 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 12:40:24,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 12:45:41,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2987.49316 ± 799.420
2025-05-03 12:45:41,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3302.09, 3388.385, 1748.7738, 1342.0825, 3425.3293, 2428.9893, 3568.7314, 3718.7566, 3341.8176, 3609.9746]
2025-05-03 12:45:41,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 504.0, 368.0, 1000.0, 649.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-03 12:45:41,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (2987.49) for latency ExtremeClogL1U23
2025-05-03 12:45:41,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-03 12:45:41,973 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-03 12:45:42,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 44/100 (estimated time remaining: 14 hours, 30 minutes, 21 seconds)
2025-05-03 12:59:56,442 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 12:59:56,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 13:04:56,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3048.06152 ± 702.615
2025-05-03 13:04:56,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1365.0929, 3389.2693, 2589.501, 2372.4072, 3560.1855, 3552.208, 3619.5493, 3550.4377, 2965.998, 3515.9646]
2025-05-03 13:04:56,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [407.0, 1000.0, 722.0, 657.0, 1000.0, 1000.0, 994.0, 1000.0, 832.0, 1000.0]
2025-05-03 13:04:56,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (3048.06) for latency ExtremeClogL1U23
2025-05-03 13:04:56,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-03 13:04:56,378 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-03 13:04:56,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 45/100 (estimated time remaining: 15 hours, 24 minutes, 42 seconds)
2025-05-03 13:20:44,041 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 13:20:44,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 13:25:20,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2840.86768 ± 944.966
2025-05-03 13:25:20,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3526.918, 1429.0541, 3646.9382, 2721.8928, 1814.7131, 1159.588, 3682.1533, 3503.5903, 3505.2444, 3418.5854]
2025-05-03 13:25:20,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 413.0, 924.0, 744.0, 567.0, 352.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-03 13:25:20,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 46/100 (estimated time remaining: 16 hours, 11 minutes, 19 seconds)
2025-05-03 13:39:51,852 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 13:39:51,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 13:44:40,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2849.89893 ± 1008.251
2025-05-03 13:44:40,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3554.2646, 3626.757, 3637.9167, 2993.198, 3487.8328, 918.8627, 2645.4187, 3460.9893, 3250.7139, 923.03613]
2025-05-03 13:44:40,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 850.0, 1000.0, 289.0, 716.0, 1000.0, 936.0, 294.0]
2025-05-03 13:44:40,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 47/100 (estimated time remaining: 16 hours, 52 minutes, 30 seconds)
2025-05-03 13:58:06,348 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 13:58:06,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 14:03:05,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2899.06299 ± 998.788
2025-05-03 14:03:05,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1613.8033, 3598.665, 3529.3853, 2030.7871, 3554.1926, 699.2544, 3496.9443, 3437.0574, 3580.6333, 3449.9075]
2025-05-03 14:03:05,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [506.0, 1000.0, 1000.0, 601.0, 1000.0, 245.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-03 14:03:05,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 48/100 (estimated time remaining: 17 hours, 10 minutes, 24 seconds)
2025-05-03 14:16:26,584 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 14:16:26,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 14:20:15,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3027.27295 ± 1005.930
2025-05-03 14:20:15,673 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3573.0706, 3405.4814, 3603.4973, 1810.7634, 3582.6755, 3148.9744, 448.07773, 3629.5789, 3611.4214, 3459.1904]
2025-05-03 14:20:15,673 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 939.0, 1000.0, 524.0, 1000.0, 887.0, 168.0, 1000.0, 1000.0, 1000.0]
2025-05-03 14:20:15,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 49/100 (estimated time remaining: 16 hours, 23 minutes, 26 seconds)
2025-05-03 14:30:12,973 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 14:30:12,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 14:34:10,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2701.01270 ± 1118.471
2025-05-03 14:34:10,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [15.055642, 3387.9639, 3533.9954, 1900.7207, 3431.6584, 3495.4663, 3573.562, 3451.1313, 2499.0276, 1721.5443]
2025-05-03 14:34:10,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [41.0, 1000.0, 1000.0, 537.0, 1000.0, 1000.0, 1000.0, 1000.0, 693.0, 508.0]
2025-05-03 14:34:10,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 50/100 (estimated time remaining: 15 hours, 10 minutes, 10 seconds)
2025-05-03 14:51:06,963 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 14:51:06,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 14:55:33,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2354.18213 ± 1343.103
2025-05-03 14:55:33,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3590.393, 3458.0635, 3391.1392, 227.30544, 3609.639, 708.42096, 3052.5713, 1134.3611, 885.74774, 3484.18]
2025-05-03 14:55:33,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 119.0, 1000.0, 238.0, 853.0, 342.0, 280.0, 1000.0]
2025-05-03 14:55:33,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 51/100 (estimated time remaining: 15 hours, 2 minutes, 9 seconds)
2025-05-03 15:09:56,804 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 15:09:56,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 15:14:20,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2646.18262 ± 1335.649
2025-05-03 15:14:20,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3492.2815, 3559.595, 3498.7178, 1136.2833, 217.86334, 3577.6094, 3472.7544, 3472.7207, 540.45996, 3493.5417]
2025-05-03 15:14:20,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 374.0, 119.0, 1000.0, 1000.0, 1000.0, 210.0, 1000.0]
2025-05-03 15:14:20,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 52/100 (estimated time remaining: 14 hours, 38 minutes, 39 seconds)
2025-05-03 15:29:20,976 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 15:29:20,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 15:34:44,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3254.26392 ± 591.566
2025-05-03 15:34:44,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3675.526, 3629.96, 3682.2734, 3649.0854, 3666.5227, 3601.6174, 2512.0825, 2416.7283, 2154.7236, 3554.119]
2025-05-03 15:34:44,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 696.0, 667.0, 599.0, 1000.0]
2025-05-03 15:34:44,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (3254.26) for latency ExtremeClogL1U23
2025-05-03 15:34:44,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-03 15:34:44,341 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-03 15:34:44,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 53/100 (estimated time remaining: 14 hours, 39 minutes, 52 seconds)
2025-05-03 15:49:14,083 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 15:49:14,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 15:54:05,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2942.67188 ± 1107.867
2025-05-03 15:54:05,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3635.87, 1372.3413, 3669.5947, 3669.0413, 1178.4832, 3667.338, 3721.9817, 3692.421, 3613.2356, 1206.4126]
2025-05-03 15:54:05,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 401.0, 1000.0, 1000.0, 342.0, 1000.0, 1000.0, 1000.0, 1000.0, 345.0]
2025-05-03 15:54:05,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 54/100 (estimated time remaining: 14 hours, 41 minutes, 57 seconds)
2025-05-03 16:09:03,188 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 16:09:03,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 16:14:28,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3693.37256 ± 32.081
2025-05-03 16:14:28,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3734.3896, 3679.6343, 3684.44, 3734.5315, 3725.1646, 3674.115, 3646.9941, 3662.9531, 3662.7537, 3728.7515]
2025-05-03 16:14:28,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-03 16:14:28,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (3693.37) for latency ExtremeClogL1U23
2025-05-03 16:14:28,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-03 16:14:28,241 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-03 16:14:28,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 55/100 (estimated time remaining: 15 hours, 22 minutes, 44 seconds)
2025-05-03 16:26:50,732 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 16:26:50,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 16:31:18,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3507.06787 ± 601.333
2025-05-03 16:31:18,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3688.8323, 3788.8945, 3754.4368, 3733.7407, 1716.4563, 3511.7664, 3677.3718, 3773.4387, 3712.16, 3713.5825]
2025-05-03 16:31:18,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 523.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-03 16:31:18,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 56/100 (estimated time remaining: 14 hours, 21 minutes, 45 seconds)
2025-05-03 16:41:58,394 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 16:41:58,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 16:45:28,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2909.64624 ± 1091.957
2025-05-03 16:45:28,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3655.3801, 3647.628, 3793.0688, 1465.0319, 1838.9221, 3678.403, 621.76306, 3270.315, 3561.5835, 3564.3667]
2025-05-03 16:45:28,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 467.0, 506.0, 1000.0, 206.0, 852.0, 1000.0, 1000.0]
2025-05-03 16:45:28,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 57/100 (estimated time remaining: 13 hours, 22 minutes, 2 seconds)
2025-05-03 16:56:33,243 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 16:56:33,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 17:00:11,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3080.46826 ± 1168.332
2025-05-03 17:00:11,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3621.5393, 211.92863, 3654.4275, 3661.58, 1400.1641, 3687.5469, 3647.6736, 3713.6611, 3583.9246, 3622.2373]
2025-05-03 17:00:11,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 94.0, 1000.0, 1000.0, 410.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-03 17:00:11,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 58/100 (estimated time remaining: 12 hours, 14 minutes, 49 seconds)
2025-05-03 17:09:45,178 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 17:09:45,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 17:12:30,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2292.48682 ± 1420.961
2025-05-03 17:12:30,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1369.1013, 159.71947, 2529.8396, 3712.6912, 3665.4834, 3750.2756, 1175.9774, 10.886586, 2874.401, 3676.4932]
2025-05-03 17:12:30,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [400.0, 120.0, 685.0, 1000.0, 1000.0, 1000.0, 360.0, 31.0, 803.0, 1000.0]
2025-05-03 17:12:30,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 59/100 (estimated time remaining: 10 hours, 58 minutes, 41 seconds)
2025-05-03 17:23:02,835 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 17:23:02,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 17:25:52,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2375.14795 ± 1429.678
2025-05-03 17:25:52,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [318.2004, 248.81206, 3719.4705, 3537.3333, 614.68964, 3632.569, 3710.7468, 2661.3442, 3610.1418, 1698.1698]
2025-05-03 17:25:52,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [126.0, 106.0, 1000.0, 1000.0, 201.0, 1000.0, 1000.0, 692.0, 1000.0, 475.0]
2025-05-03 17:25:52,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 60/100 (estimated time remaining: 9 hours, 45 minutes, 28 seconds)
2025-05-03 17:36:09,607 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 17:36:09,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 17:39:56,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3285.42334 ± 930.038
2025-05-03 17:39:56,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3812.0244, 3699.1357, 3795.469, 3491.7021, 3762.0444, 3790.9768, 1273.6193, 3816.4094, 3805.5203, 1607.3326]
2025-05-03 17:39:56,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 381.0, 1000.0, 992.0, 451.0]
2025-05-03 17:39:56,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 61/100 (estimated time remaining: 9 hours, 9 minutes, 5 seconds)
2025-05-03 17:50:07,227 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 17:50:07,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 17:53:06,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2558.91748 ± 1416.809
2025-05-03 17:53:06,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3669.4353, 3037.076, 3840.3203, 1946.9398, 3774.8013, 1125.1257, 3744.6206, 3819.6375, 335.24008, 295.9765]
2025-05-03 17:53:06,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 810.0, 1000.0, 564.0, 1000.0, 345.0, 1000.0, 1000.0, 138.0, 123.0]
2025-05-03 17:53:06,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 62/100 (estimated time remaining: 8 hours, 47 minutes, 28 seconds)
2025-05-03 18:03:50,973 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 18:03:50,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 18:07:12,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2943.14038 ± 1323.964
2025-05-03 18:07:12,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3700.0947, 3791.514, 30.729292, 1684.0419, 1317.9879, 3783.6243, 3823.7913, 3742.7976, 3705.1638, 3851.658]
2025-05-03 18:07:12,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 54.0, 475.0, 363.0, 986.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-03 18:07:12,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 63/100 (estimated time remaining: 8 hours, 29 minutes, 20 seconds)
2025-05-03 18:17:09,275 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 18:17:09,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 18:20:08,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2592.89648 ± 1454.664
2025-05-03 18:20:08,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3759.0786, 2330.6646, 3537.0134, 14.056104, 189.8106, 1333.2491, 3692.8943, 3701.5317, 3649.8396, 3720.8276]
2025-05-03 18:20:08,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 640.0, 930.0, 40.0, 121.0, 359.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-03 18:20:08,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 64/100 (estimated time remaining: 8 hours, 20 minutes, 33 seconds)
2025-05-03 18:30:21,228 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 18:30:21,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 18:33:25,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2722.27979 ± 1577.458
2025-05-03 18:33:25,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3873.366, 3863.2922, 3886.972, 3949.3804, 3880.7942, 256.58493, 3824.7053, 578.14417, 244.44588, 2865.1157]
2025-05-03 18:33:25,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 108.0, 1000.0, 179.0, 98.0, 746.0]
2025-05-03 18:33:25,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 65/100 (estimated time remaining: 8 hours, 6 minutes, 25 seconds)
2025-05-03 18:43:41,735 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 18:43:41,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 18:47:23,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3251.07544 ± 1009.582
2025-05-03 18:47:23,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [2964.6882, 3816.0684, 3744.0442, 367.195, 3757.5356, 3765.1006, 3099.034, 3704.6687, 3937.4966, 3354.9214]
2025-05-03 18:47:23,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [790.0, 1000.0, 1000.0, 149.0, 1000.0, 1000.0, 816.0, 1000.0, 1000.0, 871.0]
2025-05-03 18:47:23,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 66/100 (estimated time remaining: 7 hours, 52 minutes, 8 seconds)
2025-05-03 18:57:33,760 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 18:57:33,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 19:01:07,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3127.86377 ± 1340.439
2025-05-03 19:01:07,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [666.4338, 3833.8928, 3922.6384, 3919.818, 279.28793, 3321.977, 3811.317, 3776.071, 3863.636, 3883.5671]
2025-05-03 19:01:07,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [213.0, 1000.0, 1000.0, 1000.0, 109.0, 871.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-03 19:01:07,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 42 minutes, 30 seconds)
2025-05-03 19:11:30,727 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 19:11:30,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 19:14:13,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2389.91724 ± 1339.178
2025-05-03 19:14:13,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [595.89984, 202.7743, 3800.1826, 1892.1232, 3855.6526, 3844.0388, 1545.2402, 1684.1857, 2616.1814, 3862.892]
2025-05-03 19:14:13,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [211.0, 88.0, 1000.0, 507.0, 1000.0, 1000.0, 415.0, 464.0, 695.0, 1000.0]
2025-05-03 19:14:13,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 68/100 (estimated time remaining: 7 hours, 22 minutes, 22 seconds)
2025-05-03 19:25:07,119 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 19:25:07,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 19:29:08,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3656.61914 ± 455.017
2025-05-03 19:29:08,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [2708.9182, 2833.008, 3917.2244, 3900.269, 3988.8088, 3610.4749, 3988.2178, 3866.2544, 3918.7263, 3834.2905]
2025-05-03 19:29:08,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [727.0, 753.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-03 19:29:08,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 69/100 (estimated time remaining: 7 hours, 21 minutes, 37 seconds)
2025-05-03 19:39:03,109 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 19:39:03,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 19:42:37,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3235.88330 ± 1189.824
2025-05-03 19:42:37,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3890.3904, 3807.185, 3871.419, 37.604862, 3847.5576, 3040.1748, 3838.487, 3966.1335, 2200.636, 3859.2427]
2025-05-03 19:42:37,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 53.0, 1000.0, 794.0, 1000.0, 1000.0, 578.0, 1000.0]
2025-05-03 19:42:37,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 70/100 (estimated time remaining: 7 hours, 8 minutes, 59 seconds)
2025-05-03 19:52:52,155 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 19:52:52,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 19:56:23,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3148.06958 ± 979.510
2025-05-03 19:56:23,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3806.8557, 1822.8394, 1214.4182, 3855.7468, 2870.7322, 3910.7825, 3898.6265, 3903.191, 3925.5828, 2271.9214]
2025-05-03 19:56:23,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 492.0, 374.0, 1000.0, 750.0, 1000.0, 1000.0, 1000.0, 1000.0, 580.0]
2025-05-03 19:56:23,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 54 minutes, 2 seconds)
2025-05-03 20:06:37,032 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 20:06:37,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 20:10:00,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2962.24146 ± 1294.862
2025-05-03 20:10:00,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3753.6235, 3856.69, 243.6094, 3815.4607, 3745.5713, 3723.6235, 1535.284, 3742.4924, 1353.0209, 3853.0374]
2025-05-03 20:10:00,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 105.0, 1000.0, 1000.0, 1000.0, 419.0, 1000.0, 389.0, 1000.0]
2025-05-03 20:10:00,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 39 minutes, 32 seconds)
2025-05-03 20:20:25,160 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 20:20:25,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 20:23:12,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2350.70776 ± 1444.710
2025-05-03 20:23:12,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [163.46611, 997.70123, 422.73364, 3769.6865, 3796.9343, 3763.288, 1057.2064, 3090.442, 3813.6543, 2631.9648]
2025-05-03 20:23:12,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [98.0, 307.0, 167.0, 1000.0, 1000.0, 1000.0, 297.0, 830.0, 1000.0, 701.0]
2025-05-03 20:23:12,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 26 minutes, 15 seconds)
2025-05-03 20:32:57,409 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 20:32:57,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 20:36:49,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3494.38428 ± 767.046
2025-05-03 20:36:49,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3804.4243, 3812.5488, 3889.3447, 3857.5422, 3780.7075, 3757.8652, 1309.5996, 3027.9211, 3865.7964, 3838.095]
2025-05-03 20:36:49,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 368.0, 793.0, 1000.0, 1000.0]
2025-05-03 20:36:49,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 5 minutes, 28 seconds)
2025-05-03 20:47:45,415 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 20:47:45,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 20:51:54,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3773.02271 ± 331.461
2025-05-03 20:51:54,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3826.4817, 3882.7117, 3899.3713, 3935.1306, 3916.8855, 2783.1787, 3862.0085, 3910.6873, 3864.273, 3849.4988]
2025-05-03 20:51:54,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 734.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-03 20:51:54,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (3773.02) for latency ExtremeClogL1U23
2025-05-03 20:51:54,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-03 20:51:54,454 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-03 20:51:54,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 16 seconds)
2025-05-03 21:01:59,375 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 21:01:59,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 21:06:03,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3731.26953 ± 579.152
2025-05-03 21:06:03,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3818.2876, 3882.2156, 4034.8918, 3890.583, 4018.5503, 3891.16, 3956.3923, 2003.8834, 3929.4138, 3887.3162]
2025-05-03 21:06:03,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 528.0, 1000.0, 1000.0]
2025-05-03 21:06:03,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 48 minutes, 21 seconds)
2025-05-03 21:16:47,996 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 21:16:47,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 21:20:49,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3604.59229 ± 978.070
2025-05-03 21:20:49,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3781.3933, 3977.9976, 4051.6194, 3931.4768, 3963.303, 3841.9224, 3782.945, 3967.771, 4063.446, 684.047]
2025-05-03 21:20:49,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 241.0]
2025-05-03 21:20:49,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 39 minutes, 55 seconds)
2025-05-03 21:30:39,360 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 21:30:39,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 21:33:58,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2953.77881 ± 1440.188
2025-05-03 21:33:58,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3940.0251, 1791.6115, 622.64996, 3877.0627, 3902.3545, 3716.686, 97.21093, 3792.7612, 3898.5923, 3898.832]
2025-05-03 21:33:58,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 485.0, 207.0, 1000.0, 1000.0, 1000.0, 71.0, 1000.0, 1000.0, 1000.0]
2025-05-03 21:33:58,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 25 minutes, 35 seconds)
2025-05-03 21:44:55,000 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 21:44:55,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 21:48:09,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2882.42114 ± 1547.449
2025-05-03 21:48:09,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3928.625, 26.95926, 3860.8835, 3686.0635, 3956.0632, 3960.1, 3852.42, 1456.6283, 3839.1746, 257.2946]
2025-05-03 21:48:09,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 47.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 408.0, 1000.0, 118.0]
2025-05-03 21:48:09,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 13 minutes, 51 seconds)
2025-05-03 21:57:57,361 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 21:57:57,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 22:01:48,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3422.37109 ± 1060.110
2025-05-03 22:01:48,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3758.2522, 3797.7712, 3866.2034, 3781.3147, 342.1645, 3944.0537, 3875.9138, 3921.2915, 3937.4597, 2999.2878]
2025-05-03 22:01:48,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 132.0, 1000.0, 1000.0, 1000.0, 1000.0, 759.0]
2025-05-03 22:01:48,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 53 minutes, 34 seconds)
2025-05-03 22:13:09,793 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 22:13:09,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 22:18:20,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3730.06201 ± 895.572
2025-05-03 22:18:20,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [4157.182, 4072.9277, 4157.3, 4056.1482, 4117.334, 3989.6262, 1059.8784, 3870.518, 3912.9097, 3906.7983]
2025-05-03 22:18:20,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 295.0, 1000.0, 1000.0, 1000.0]
2025-05-03 22:18:20,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 49 minutes, 6 seconds)
2025-05-03 22:34:06,071 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 22:34:06,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 22:40:02,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3300.28247 ± 1056.334
2025-05-03 22:40:02,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3894.7058, 3420.4224, 3874.309, 3987.7886, 2548.9238, 3804.6296, 3784.885, 3999.5544, 3293.471, 394.13477]
2025-05-03 22:40:02,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 867.0, 1000.0, 1000.0, 656.0, 933.0, 941.0, 1000.0, 843.0, 150.0]
2025-05-03 22:40:02,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 1 minute, 1 second)
2025-05-03 23:01:40,698 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 23:01:40,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 23:08:03,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3566.14111 ± 1041.012
2025-05-03 23:08:03,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3902.341, 3888.2734, 3941.0698, 445.4382, 3887.3203, 3865.3853, 3919.6414, 3930.4683, 3870.544, 4010.9297]
2025-05-03 23:08:03,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 161.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-03 23:08:03,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 83/100 (estimated time remaining: 5 hours, 38 minutes, 41 seconds)
2025-05-03 23:27:02,497 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 23:27:02,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 23:33:04,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3594.76294 ± 1021.313
2025-05-03 23:33:04,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3902.425, 4003.1016, 4009.836, 4004.1252, 3880.996, 3793.2, 3947.3452, 3866.1816, 4002.4573, 537.9595]
2025-05-03 23:33:04,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 976.0, 1000.0, 1000.0, 1000.0, 1000.0, 189.0]
2025-05-03 23:33:04,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 84/100 (estimated time remaining: 5 hours, 56 minutes, 42 seconds)
2025-05-03 23:47:59,852 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 23:47:59,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 23:54:04,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3605.70068 ± 1000.343
2025-05-03 23:54:04,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [4068.607, 4048.053, 3988.244, 3966.04, 3613.5657, 627.3226, 4026.31, 3877.1304, 3939.27, 3902.4634]
2025-05-03 23:54:04,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 911.0, 193.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-03 23:54:04,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 85/100 (estimated time remaining: 5 hours, 59 minutes, 16 seconds)
2025-05-04 00:10:25,354 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-04 00:10:25,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-04 00:16:29,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3822.16260 ± 959.953
2025-05-04 00:16:29,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [4225.908, 4173.7446, 4209.5796, 956.86127, 4080.059, 4176.0576, 4104.088, 4151.057, 3893.1594, 4251.1094]
2025-05-04 00:16:29,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 285.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-04 00:16:29,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (3822.16) for latency ExtremeClogL1U23
2025-05-04 00:16:29,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-04 00:16:29,860 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-04 00:16:29,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 86/100 (estimated time remaining: 5 hours, 54 minutes, 28 seconds)
2025-05-04 00:31:32,528 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-04 00:31:32,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-04 00:35:44,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3565.59912 ± 885.928
2025-05-04 00:35:44,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1096.9801, 3876.8628, 4021.8345, 4125.8013, 4071.9343, 4023.7354, 3756.7783, 4065.362, 2954.2195, 3662.483]
2025-05-04 00:35:44,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [304.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 960.0, 1000.0, 730.0, 924.0]
2025-05-04 00:35:44,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 87/100 (estimated time remaining: 5 hours, 23 minutes, 58 seconds)
2025-05-04 00:46:07,065 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-04 00:46:07,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-04 00:50:00,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3649.83447 ± 1081.131
2025-05-04 00:50:00,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [4052.4307, 3992.9045, 416.50247, 4018.438, 3963.1443, 4126.5303, 3944.8665, 3908.1003, 3899.2964, 4176.1304]
2025-05-04 00:50:00,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 141.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-04 00:50:00,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 88/100 (estimated time remaining: 4 hours, 25 minutes, 3 seconds)
2025-05-04 01:00:01,686 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-04 01:00:01,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-04 01:03:55,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3667.72607 ± 690.828
2025-05-04 01:03:55,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [4000.3713, 3955.9502, 3050.423, 3995.2783, 3941.803, 3938.0974, 3991.54, 3926.8708, 4101.751, 1775.1764]
2025-05-04 01:03:55,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 796.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 497.0]
2025-05-04 01:03:55,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 38 minutes, 3 seconds)
2025-05-04 01:14:07,461 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-04 01:14:07,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-04 01:17:34,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3134.96680 ± 1157.931
2025-05-04 01:17:34,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3701.4294, 1921.8884, 3916.524, 3835.4785, 3827.1423, 3907.105, 1233.1816, 1042.6208, 4047.7563, 3916.54]
2025-05-04 01:17:34,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 526.0, 1000.0, 1000.0, 1000.0, 1000.0, 350.0, 290.0, 1000.0, 1000.0]
2025-05-04 01:17:34,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 3 minutes, 41 seconds)
2025-05-04 01:28:26,812 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-04 01:28:26,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-04 01:32:36,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3863.59131 ± 232.518
2025-05-04 01:32:36,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [4050.978, 4006.9976, 3690.0112, 3934.4053, 3254.0999, 3885.4053, 3853.6287, 3854.873, 3997.4126, 4108.1016]
2025-05-04 01:32:36,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 994.0, 820.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-04 01:32:36,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (3863.59) for latency ExtremeClogL1U23
2025-05-04 01:32:36,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-04 01:32:36,696 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-walker2d/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-04 01:32:36,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 32 minutes, 13 seconds)
2025-05-04 01:42:11,788 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-04 01:42:11,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-04 01:44:54,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 2439.62524 ± 1643.616
2025-05-04 01:44:54,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3940.6458, 4176.539, 3831.8198, 1244.2504, 4078.3022, 4064.4255, 773.00507, 51.965065, 383.2587, 1852.0404]
2025-05-04 01:44:54,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [946.0, 1000.0, 1000.0, 370.0, 1000.0, 1000.0, 244.0, 63.0, 147.0, 537.0]
2025-05-04 01:44:54,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 4 minutes, 29 seconds)
2025-05-04 01:55:08,934 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-04 01:55:08,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-04 01:59:08,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3732.94141 ± 959.128
2025-05-04 01:59:08,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [4107.9165, 4231.472, 4085.4878, 3962.3235, 4058.9834, 4218.2827, 889.6873, 3673.4814, 4023.1096, 4078.6704]
2025-05-04 01:59:08,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 248.0, 1000.0, 988.0, 1000.0]
2025-05-04 01:59:08,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 50 minutes, 36 seconds)
2025-05-04 02:09:00,479 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-04 02:09:00,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-04 02:12:44,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3429.19604 ± 1211.084
2025-05-04 02:12:44,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [4103.1597, 4118.814, 4061.4358, 4051.3535, 3974.8508, 1313.2677, 4012.293, 3993.6838, 733.5135, 3929.5862]
2025-05-04 02:12:44,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 366.0, 1000.0, 1000.0, 236.0, 1000.0]
2025-05-04 02:12:44,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 36 minutes, 19 seconds)
2025-05-04 02:23:04,894 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-04 02:23:04,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-04 02:26:39,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3320.06714 ± 1500.410
2025-05-04 02:26:39,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3854.3762, 4018.2698, 4079.505, 4182.942, 4073.0115, 4022.96, 20.774536, 657.71246, 4247.2686, 4043.8528]
2025-05-04 02:26:39,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 48.0, 227.0, 1000.0, 1000.0]
2025-05-04 02:26:39,660 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 22 minutes, 54 seconds)
2025-05-04 02:37:06,461 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-04 02:37:06,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-04 02:41:00,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3668.58276 ± 808.078
2025-05-04 02:41:00,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [2831.6343, 4109.532, 1513.8293, 3833.0364, 4048.9756, 4106.9795, 4120.8384, 4002.3381, 4117.574, 4001.09]
2025-05-04 02:41:00,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [713.0, 1000.0, 413.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-04 02:41:00,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 8 minutes, 23 seconds)
2025-05-04 02:51:15,194 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-04 02:51:15,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-04 02:55:31,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3803.89917 ± 231.049
2025-05-04 02:55:31,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3789.2522, 3916.0417, 3872.4512, 3899.1116, 3858.418, 3118.5105, 3891.5469, 3882.3857, 3903.6023, 3907.673]
2025-05-04 02:55:31,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 806.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-04 02:55:31,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 97/100 (estimated time remaining: 56 minutes, 29 seconds)
2025-05-04 03:06:23,509 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-04 03:06:23,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-04 03:10:25,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3799.02783 ± 983.689
2025-05-04 03:10:25,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [4171.1724, 3995.9446, 4038.1033, 4324.3564, 3983.5938, 3976.483, 4134.731, 4225.877, 4270.686, 869.3285]
2025-05-04 03:10:25,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 256.0]
2025-05-04 03:10:25,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 98/100 (estimated time remaining: 42 minutes, 46 seconds)
2025-05-04 03:20:07,712 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-04 03:20:07,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-04 03:24:00,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3595.04053 ± 881.477
2025-05-04 03:24:00,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3957.769, 2807.0725, 4070.9402, 4084.3152, 3941.897, 4075.516, 1180.4921, 3993.6829, 3943.726, 3894.996]
2025-05-04 03:24:00,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 727.0, 1000.0, 1000.0, 1000.0, 1000.0, 377.0, 1000.0, 1000.0, 1000.0]
2025-05-04 03:24:00,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 99/100 (estimated time remaining: 28 minutes, 30 seconds)
2025-05-04 03:34:22,263 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-04 03:34:22,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-04 03:38:36,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3835.20630 ± 366.494
2025-05-04 03:38:36,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [3892.2244, 4006.1829, 3852.7717, 2778.9114, 4141.465, 3967.5686, 3912.9124, 3757.378, 3990.2087, 4052.4434]
2025-05-04 03:38:36,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 717.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-04 03:38:36,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 100/100 (estimated time remaining: 14 minutes, 23 seconds)
2025-05-04 03:49:00,792 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-04 03:49:00,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-04 03:52:13,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 3098.21045 ± 1492.111
2025-05-04 03:52:13,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [4072.3816, 816.36316, 4220.8696, 4215.59, 95.13562, 1995.7448, 3113.8657, 4035.825, 4103.4097, 4312.923]
2025-05-04 03:52:13,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [1000.0, 234.0, 1000.0, 1000.0, 92.0, 499.0, 780.0, 1000.0, 1000.0, 1000.0]
2025-05-04 03:52:13,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1149 [DEBUG]: Training session finished
