2025-05-06 15:36:41,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-mbpac_memdelay
2025-05-06 15:36:41,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-mbpac_memdelay
2025-05-06 15:36:41,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1008 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x151db185d0d0>}
2025-05-06 15:36:41,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1009 [DEBUG]: using device: cuda
2025-05-06 15:36:41,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1031 [INFO]: Creating new trainer
2025-05-06 15:36:41,563 baseline-mbpac-noisy-walker2d:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-06 15:36:41,563 baseline-mbpac-noisy-walker2d:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-06 15:36:41,570 baseline-mbpac-noisy-walker2d:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-05-06 15:36:42,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1092 [DEBUG]: Starting training session...
2025-05-06 15:36:42,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 1/100
2025-05-06 15:50:30,810 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 15:50:30,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 15:53:06,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 387.08685 ± 311.059
2025-05-06 15:53:06,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [171.4112, 417.91287, 392.7042, 49.380898, 36.328804, 905.91034, 288.91302, 96.985466, 911.1152, 600.20624]
2025-05-06 15:53:06,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [97.0, 237.0, 236.0, 178.0, 175.0, 1000.0, 172.0, 266.0, 1000.0, 675.0]
2025-05-06 15:53:06,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (387.09) for latency ExtremeSparseL4U32
2025-05-06 15:53:06,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 15:53:06,509 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 15:53:06,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 2/100 (estimated time remaining: 27 hours, 4 minutes, 2 seconds)
2025-05-06 16:05:16,612 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 16:05:16,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 16:06:02,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 7.86525 ± 36.549
2025-05-06 16:06:02,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [-25.597488, -3.6141438, 46.593723, 6.51567, 96.6087, -36.204597, -18.837183, 9.452943, 2.1578655, 1.5770001]
2025-05-06 16:06:02,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [96.0, 74.0, 188.0, 99.0, 295.0, 135.0, 72.0, 122.0, 59.0, 67.0]
2025-05-06 16:06:02,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 3/100 (estimated time remaining: 23 hours, 57 minutes, 11 seconds)
2025-05-06 16:18:17,727 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 16:18:17,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 16:19:30,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 272.19296 ± 182.365
2025-05-06 16:19:30,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [579.8394, 42.330948, 113.456375, 34.06483, 321.85483, 400.42697, 292.7664, 527.6223, 135.78957, 273.77832]
2025-05-06 16:19:30,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [508.0, 185.0, 197.0, 42.0, 133.0, 378.0, 168.0, 455.0, 177.0, 184.0]
2025-05-06 16:19:30,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 4/100 (estimated time remaining: 23 hours, 4 minutes, 10 seconds)
2025-05-06 16:31:43,016 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 16:31:43,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 16:33:18,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 408.71802 ± 194.523
2025-05-06 16:33:18,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [461.22318, 34.75203, 442.0609, 767.79663, 401.2634, 527.32635, 415.97, 371.62402, 134.35149, 530.8121]
2025-05-06 16:33:18,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [249.0, 42.0, 229.0, 482.0, 173.0, 327.0, 256.0, 208.0, 282.0, 282.0]
2025-05-06 16:33:18,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (408.72) for latency ExtremeSparseL4U32
2025-05-06 16:33:18,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 16:33:18,538 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 16:33:18,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 5/100 (estimated time remaining: 22 hours, 38 minutes, 31 seconds)
2025-05-06 16:45:47,805 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 16:45:47,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 16:46:54,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 219.02975 ± 148.505
2025-05-06 16:46:54,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [376.15845, 32.934013, 22.48676, 257.55716, 23.949436, 271.95084, 327.10852, 321.03467, 438.35956, 118.75841]
2025-05-06 16:46:54,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [288.0, 44.0, 34.0, 383.0, 96.0, 160.0, 185.0, 210.0, 287.0, 85.0]
2025-05-06 16:46:54,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 6/100 (estimated time remaining: 22 hours, 13 minutes, 51 seconds)
2025-05-06 16:59:33,096 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 16:59:33,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 17:00:19,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 194.22079 ± 178.162
2025-05-06 17:00:19,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [515.22015, 67.61779, 30.575384, 4.0423193, 277.46594, 359.4754, 41.35165, 193.54778, 421.02637, 31.885273]
2025-05-06 17:00:19,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [473.0, 65.0, 38.0, 13.0, 162.0, 189.0, 58.0, 217.0, 257.0, 43.0]
2025-05-06 17:00:21,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 7/100 (estimated time remaining: 21 hours, 4 minutes, 8 seconds)
2025-05-06 17:12:54,061 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 17:12:54,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 17:13:45,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 184.73758 ± 141.252
2025-05-06 17:13:45,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [229.38297, 2.782843, 3.004798, 325.06308, 198.99747, 56.5099, 362.37338, 307.61243, 336.34735, 25.301687]
2025-05-06 17:13:45,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [160.0, 12.0, 12.0, 207.0, 148.0, 70.0, 223.0, 203.0, 286.0, 36.0]
2025-05-06 17:13:45,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 8/100 (estimated time remaining: 20 hours, 59 minutes, 39 seconds)
2025-05-06 17:25:17,411 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 17:25:17,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 17:25:58,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 178.49469 ± 173.422
2025-05-06 17:25:58,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [371.66425, 40.713116, 41.865154, 75.53576, 31.396955, 412.45193, 27.82099, 276.47577, 471.74695, 35.27612]
2025-05-06 17:25:58,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [219.0, 68.0, 68.0, 138.0, 57.0, 269.0, 48.0, 147.0, 283.0, 56.0]
2025-05-06 17:25:59,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 9/100 (estimated time remaining: 20 hours, 23 minutes, 1 second)
2025-05-06 17:38:00,421 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 17:38:00,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 17:39:30,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 302.81470 ± 340.189
2025-05-06 17:39:30,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [566.28625, 693.02313, 387.02173, 1043.251, 168.25873, 24.92452, 51.033714, 37.603916, 45.313484, 11.430748]
2025-05-06 17:39:30,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [207.0, 497.0, 217.0, 1000.0, 233.0, 38.0, 61.0, 63.0, 62.0, 23.0]
2025-05-06 17:39:30,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 10/100 (estimated time remaining: 20 hours, 4 minutes, 48 seconds)
2025-05-06 17:52:19,616 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 17:52:19,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 17:53:47,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 243.92082 ± 270.533
2025-05-06 17:53:47,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [16.719814, 44.253883, 222.59407, 211.59059, 943.52045, 49.922905, 24.189306, 464.24994, 316.7056, 145.46178]
2025-05-06 17:53:47,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [25.0, 58.0, 189.0, 214.0, 879.0, 96.0, 82.0, 301.0, 249.0, 228.0]
2025-05-06 17:53:47,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 11/100 (estimated time remaining: 20 hours, 3 minutes, 47 seconds)
2025-05-06 18:06:18,784 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 18:06:18,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 18:07:16,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 205.18320 ± 162.601
2025-05-06 18:07:16,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [141.89302, 28.691835, 452.79205, 87.80102, 27.47304, 64.031044, 373.77267, 358.31192, 108.401215, 408.66428]
2025-05-06 18:07:16,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [164.0, 35.0, 237.0, 116.0, 45.0, 112.0, 266.0, 215.0, 83.0, 246.0]
2025-05-06 18:07:16,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 12/100 (estimated time remaining: 19 hours, 51 minutes, 9 seconds)
2025-05-06 18:20:05,573 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 18:20:05,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 18:20:47,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 157.04858 ± 139.394
2025-05-06 18:20:47,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [28.345467, 387.2257, 32.627697, 329.82043, 49.337, 28.628843, 252.40225, 16.549822, 139.50993, 306.03873]
2025-05-06 18:20:47,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [35.0, 264.0, 43.0, 174.0, 51.0, 36.0, 132.0, 30.0, 128.0, 206.0]
2025-05-06 18:20:47,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 13/100 (estimated time remaining: 19 hours, 39 minutes, 42 seconds)
2025-05-06 18:33:04,034 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 18:33:04,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 18:34:50,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 385.36053 ± 134.486
2025-05-06 18:34:50,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [300.2868, 312.8448, 324.54303, 489.35995, 336.1624, 684.73236, 367.8118, 360.10577, 507.01935, 170.73933]
2025-05-06 18:34:50,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [207.0, 174.0, 202.0, 304.0, 213.0, 552.0, 339.0, 430.0, 277.0, 160.0]
2025-05-06 18:34:50,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 14/100 (estimated time remaining: 19 hours, 58 minutes, 15 seconds)
2025-05-06 18:47:15,239 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 18:47:15,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 18:48:19,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 332.66095 ± 275.398
2025-05-06 18:48:20,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [417.63278, 39.532257, 248.09868, 297.2119, 267.01343, 79.603424, 262.16904, 288.08072, 331.9892, 1095.2778]
2025-05-06 18:48:20,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [190.0, 48.0, 132.0, 152.0, 145.0, 90.0, 139.0, 151.0, 184.0, 482.0]
2025-05-06 18:48:20,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 15/100 (estimated time remaining: 19 hours, 43 minutes, 55 seconds)
2025-05-06 18:59:57,983 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 18:59:57,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 19:00:51,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 280.44025 ± 170.568
2025-05-06 19:00:51,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [406.4027, 153.0187, 168.88269, 438.68893, 441.98508, 21.411377, 471.45743, 18.135601, 254.94914, 429.47104]
2025-05-06 19:00:51,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [260.0, 216.0, 108.0, 202.0, 258.0, 37.0, 262.0, 33.0, 152.0, 256.0]
2025-05-06 19:00:51,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 16/100 (estimated time remaining: 19 hours, 21 seconds)
2025-05-06 19:12:10,486 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 19:12:10,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 19:13:20,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 398.12692 ± 248.669
2025-05-06 19:13:20,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1034.0725, 328.37476, 518.73883, 81.699615, 157.45795, 274.99832, 421.32452, 386.9864, 287.6198, 489.99655]
2025-05-06 19:13:20,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [471.0, 191.0, 261.0, 87.0, 219.0, 153.0, 248.0, 236.0, 167.0, 251.0]
2025-05-06 19:13:20,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 17/100 (estimated time remaining: 18 hours, 29 minutes, 53 seconds)
2025-05-06 19:25:19,853 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 19:25:19,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 19:26:45,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 468.00433 ± 126.701
2025-05-06 19:26:45,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [558.0904, 512.9978, 255.29025, 364.1491, 744.84265, 438.53705, 514.0043, 353.4575, 497.71353, 440.9616]
2025-05-06 19:26:45,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [249.0, 231.0, 145.0, 161.0, 292.0, 284.0, 256.0, 166.0, 235.0, 241.0]
2025-05-06 19:26:45,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (468.00) for latency ExtremeSparseL4U32
2025-05-06 19:26:45,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 19:26:45,912 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 19:26:45,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 18/100 (estimated time remaining: 18 hours, 15 minutes, 14 seconds)
2025-05-06 19:39:29,568 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 19:39:29,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 19:41:02,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 603.36310 ± 351.519
2025-05-06 19:41:02,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [789.76385, 501.60663, 23.320084, 172.39236, 721.2063, 692.7347, 263.40097, 828.6862, 762.8257, 1277.6941]
2025-05-06 19:41:02,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [340.0, 219.0, 32.0, 111.0, 286.0, 265.0, 145.0, 318.0, 312.0, 479.0]
2025-05-06 19:41:02,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (603.36) for latency ExtremeSparseL4U32
2025-05-06 19:41:02,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 19:41:03,002 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 19:41:03,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 19/100 (estimated time remaining: 18 hours, 5 minutes, 42 seconds)
2025-05-06 19:53:39,176 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 19:53:39,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 19:55:23,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 667.38940 ± 164.680
2025-05-06 19:55:23,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [543.66364, 888.5493, 789.8728, 772.38214, 720.4758, 766.8252, 774.25006, 437.12012, 346.96408, 633.7915]
2025-05-06 19:55:23,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [228.0, 384.0, 349.0, 311.0, 278.0, 289.0, 307.0, 212.0, 177.0, 231.0]
2025-05-06 19:55:23,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (667.39) for latency ExtremeSparseL4U32
2025-05-06 19:55:23,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 19:55:23,517 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 19:55:23,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 20/100 (estimated time remaining: 18 hours, 6 minutes, 15 seconds)
2025-05-06 20:08:17,755 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 20:08:18,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 20:10:18,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 720.47449 ± 209.755
2025-05-06 20:10:18,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [325.80093, 700.2614, 1096.8131, 637.2241, 932.71375, 706.0019, 818.90857, 553.44006, 878.59607, 554.98425]
2025-05-06 20:10:18,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [169.0, 305.0, 450.0, 315.0, 378.0, 354.0, 363.0, 255.0, 324.0, 275.0]
2025-05-06 20:10:18,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (720.47) for latency ExtremeSparseL4U32
2025-05-06 20:10:18,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 20:10:18,355 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 20:10:18,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 21/100 (estimated time remaining: 18 hours, 31 minutes, 4 seconds)
2025-05-06 20:22:23,594 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 20:22:23,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 20:24:08,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 660.03699 ± 88.147
2025-05-06 20:24:08,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [623.1927, 687.1296, 702.7212, 767.5597, 724.87006, 647.9974, 527.1575, 626.45703, 785.80597, 507.47855]
2025-05-06 20:24:08,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [235.0, 301.0, 287.0, 308.0, 322.0, 274.0, 254.0, 225.0, 369.0, 203.0]
2025-05-06 20:24:08,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 22/100 (estimated time remaining: 18 hours, 38 minutes, 47 seconds)
2025-05-06 20:36:54,203 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 20:36:54,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 20:38:28,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 582.56885 ± 200.599
2025-05-06 20:38:28,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [709.7022, 641.6512, 87.94872, 367.3765, 784.7775, 527.81995, 591.5316, 747.68506, 692.1458, 675.04974]
2025-05-06 20:38:28,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [312.0, 260.0, 117.0, 157.0, 359.0, 230.0, 253.0, 294.0, 259.0, 272.0]
2025-05-06 20:38:28,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 23/100 (estimated time remaining: 18 hours, 38 minutes, 44 seconds)
2025-05-06 20:51:19,684 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 20:51:20,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 20:53:12,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 745.25366 ± 139.306
2025-05-06 20:53:12,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [601.3648, 1040.3871, 774.7651, 615.2059, 936.9405, 634.8528, 792.4362, 625.2044, 739.704, 691.6757]
2025-05-06 20:53:12,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [262.0, 373.0, 287.0, 269.0, 350.0, 271.0, 304.0, 256.0, 329.0, 254.0]
2025-05-06 20:53:12,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (745.25) for latency ExtremeSparseL4U32
2025-05-06 20:53:12,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 20:53:12,454 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 20:53:12,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 24/100 (estimated time remaining: 18 hours, 31 minutes, 13 seconds)
2025-05-06 21:05:52,866 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 21:05:52,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 21:08:52,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1177.79004 ± 341.408
2025-05-06 21:08:52,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [884.92065, 1674.4031, 1375.4689, 1300.4913, 927.7334, 1505.1703, 1260.4114, 1409.3885, 465.3272, 974.5845]
2025-05-06 21:08:52,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [345.0, 600.0, 471.0, 583.0, 327.0, 574.0, 539.0, 647.0, 270.0, 378.0]
2025-05-06 21:08:52,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (1177.79) for latency ExtremeSparseL4U32
2025-05-06 21:08:52,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 21:08:52,263 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 21:08:52,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 25/100 (estimated time remaining: 18 hours, 36 minutes, 52 seconds)
2025-05-06 21:21:32,016 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 21:21:32,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 21:24:31,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1103.58411 ± 692.055
2025-05-06 21:24:31,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [2178.6614, 1614.3066, 336.1718, 2014.4347, 600.46954, 658.041, 1618.6257, 13.888965, 1120.3378, 880.90344]
2025-05-06 21:24:31,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [888.0, 862.0, 229.0, 759.0, 322.0, 284.0, 606.0, 23.0, 439.0, 338.0]
2025-05-06 21:24:31,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 26/100 (estimated time remaining: 18 hours, 33 minutes, 23 seconds)
2025-05-06 21:36:32,178 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 21:36:32,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 21:40:08,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1463.45251 ± 619.167
2025-05-06 21:40:08,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [2219.3726, 1890.8293, 2556.155, 990.8013, 377.21628, 956.20605, 1258.5449, 1454.2893, 1767.9832, 1163.1266]
2025-05-06 21:40:08,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [823.0, 751.0, 946.0, 369.0, 233.0, 373.0, 526.0, 591.0, 668.0, 485.0]
2025-05-06 21:40:08,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (1463.45) for latency ExtremeSparseL4U32
2025-05-06 21:40:08,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-06 21:40:08,658 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 21:40:08,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 27/100 (estimated time remaining: 18 hours, 44 minutes, 48 seconds)
2025-05-06 21:52:54,596 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 21:52:54,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 21:55:27,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 987.24628 ± 445.010
2025-05-06 21:55:27,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1067.3872, 1264.2683, 86.0633, 712.07776, 1274.3883, 665.97906, 1424.6508, 1744.1628, 841.9767, 791.509]
2025-05-06 21:55:27,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [440.0, 489.0, 83.0, 311.0, 522.0, 282.0, 525.0, 706.0, 364.0, 323.0]
2025-05-06 21:55:27,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 28/100 (estimated time remaining: 18 hours, 43 minutes, 57 seconds)
2025-05-06 22:07:25,099 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 22:07:25,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 22:09:44,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 930.26270 ± 284.229
2025-05-06 22:09:44,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [781.84564, 1380.6815, 965.80035, 739.40594, 512.83435, 1151.1865, 1373.3873, 623.91315, 767.99866, 1005.5731]
2025-05-06 22:09:44,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [303.0, 510.0, 382.0, 314.0, 219.0, 491.0, 477.0, 269.0, 328.0, 393.0]
2025-05-06 22:09:44,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 29/100 (estimated time remaining: 18 hours, 21 minutes, 58 seconds)
2025-05-06 22:22:22,547 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 22:22:22,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 22:25:00,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1054.83521 ± 309.015
2025-05-06 22:25:00,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1333.9661, 807.0957, 482.78397, 996.66486, 1565.3766, 925.1895, 1194.9065, 1358.619, 754.652, 1129.0964]
2025-05-06 22:25:00,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [490.0, 309.0, 228.0, 398.0, 575.0, 356.0, 483.0, 585.0, 364.0, 398.0]
2025-05-06 22:25:00,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 30/100 (estimated time remaining: 18 hours, 1 minute, 2 seconds)
2025-05-06 22:37:45,769 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 22:37:45,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 22:40:04,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 982.32910 ± 185.742
2025-05-06 22:40:04,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1041.4988, 1021.0926, 984.6818, 597.3623, 1112.1235, 1304.7986, 1015.84705, 870.8401, 1100.9122, 774.13367]
2025-05-06 22:40:04,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [369.0, 389.0, 349.0, 254.0, 382.0, 495.0, 392.0, 332.0, 405.0, 317.0]
2025-05-06 22:40:04,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 31/100 (estimated time remaining: 17 hours, 37 minutes, 37 seconds)
2025-05-06 22:52:52,915 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 22:52:52,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 22:55:27,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 982.04022 ± 311.322
2025-05-06 22:55:27,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1045.4575, 759.47864, 1309.9847, 1232.1044, 386.28995, 1392.5509, 727.702, 871.70917, 1308.9956, 786.1292]
2025-05-06 22:55:27,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [577.0, 293.0, 457.0, 625.0, 224.0, 489.0, 293.0, 350.0, 496.0, 320.0]
2025-05-06 22:55:27,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 32/100 (estimated time remaining: 17 hours, 19 minutes, 23 seconds)
2025-05-06 23:07:22,832 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 23:07:22,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 23:09:31,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 858.91486 ± 320.316
2025-05-06 23:09:31,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [817.4171, 722.66254, 738.77423, 1214.4159, 1103.0176, 804.6078, 1206.2959, 69.87267, 814.16345, 1097.9215]
2025-05-06 23:09:31,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [300.0, 304.0, 312.0, 438.0, 404.0, 291.0, 492.0, 131.0, 346.0, 398.0]
2025-05-06 23:09:31,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 33/100 (estimated time remaining: 16 hours, 47 minutes, 21 seconds)
2025-05-06 23:22:24,429 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 23:22:24,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 23:24:43,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 955.28955 ± 500.621
2025-05-06 23:24:43,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1921.5586, 632.93085, 683.02856, 1549.2341, 1401.5428, 705.1377, 182.26794, 1120.8033, 705.4768, 650.91516]
2025-05-06 23:24:43,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [700.0, 225.0, 280.0, 537.0, 555.0, 291.0, 110.0, 432.0, 288.0, 268.0]
2025-05-06 23:24:43,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 34/100 (estimated time remaining: 16 hours, 44 minutes, 56 seconds)
2025-05-06 23:37:12,439 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 23:37:13,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 23:39:27,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 995.09052 ± 417.939
2025-05-06 23:39:27,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1259.6285, 806.8769, 1444.9818, 596.7944, 633.72327, 705.1119, 752.7073, 995.5097, 1975.2598, 780.3119]
2025-05-06 23:39:27,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [420.0, 305.0, 512.0, 224.0, 251.0, 275.0, 281.0, 353.0, 652.0, 300.0]
2025-05-06 23:39:27,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 35/100 (estimated time remaining: 16 hours, 22 minutes, 53 seconds)
2025-05-06 23:51:21,653 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 23:51:22,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 23:53:10,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 754.62329 ± 374.888
2025-05-06 23:53:10,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1171.652, 624.6901, 646.5199, 653.5901, 645.354, 789.83026, 623.523, 779.3408, 1566.2922, 45.440918]
2025-05-06 23:53:10,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [399.0, 236.0, 241.0, 257.0, 320.0, 267.0, 261.0, 292.0, 548.0, 44.0]
2025-05-06 23:53:10,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 36/100 (estimated time remaining: 15 hours, 50 minutes, 13 seconds)
2025-05-07 00:05:36,162 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 00:05:36,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 00:07:18,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 703.78766 ± 234.690
2025-05-07 00:07:18,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [671.1268, 801.5524, 126.80336, 1066.3774, 947.0055, 684.036, 602.6419, 675.5567, 774.3399, 688.43646]
2025-05-07 00:07:18,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [266.0, 308.0, 91.0, 408.0, 341.0, 291.0, 226.0, 252.0, 280.0, 258.0]
2025-05-07 00:07:18,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 37/100 (estimated time remaining: 15 hours, 19 minutes, 38 seconds)
2025-05-07 00:19:11,496 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 00:19:11,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 00:20:53,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 857.80457 ± 265.563
2025-05-07 00:20:55,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [669.08984, 707.6426, 927.1057, 726.4525, 749.45624, 693.7137, 1493.733, 688.50354, 1217.9958, 704.35254]
2025-05-07 00:20:55,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [277.0, 279.0, 352.0, 287.0, 299.0, 266.0, 526.0, 278.0, 487.0, 259.0]
2025-05-07 00:20:55,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 38/100 (estimated time remaining: 14 hours, 59 minutes, 38 seconds)
2025-05-07 00:33:07,819 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 00:33:07,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 00:34:54,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 866.31897 ± 228.823
2025-05-07 00:34:54,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [816.63763, 697.30835, 1253.0044, 616.0617, 714.1652, 1113.0471, 1227.5532, 799.576, 628.128, 797.70776]
2025-05-07 00:34:54,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [432.0, 288.0, 465.0, 250.0, 284.0, 434.0, 500.0, 318.0, 260.0, 306.0]
2025-05-07 00:34:54,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 39/100 (estimated time remaining: 14 hours, 30 minutes, 16 seconds)
2025-05-07 00:46:39,181 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 00:46:39,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 00:48:20,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 721.40216 ± 226.336
2025-05-07 00:48:21,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [768.69934, 631.0973, 579.7362, 571.8607, 212.98466, 872.19574, 822.48615, 733.48566, 1025.4595, 996.01605]
2025-05-07 00:48:21,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [312.0, 245.0, 230.0, 214.0, 119.0, 309.0, 339.0, 270.0, 311.0, 356.0]
2025-05-07 00:48:21,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 40/100 (estimated time remaining: 14 hours, 33 seconds)
2025-05-07 01:00:01,931 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 01:00:01,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 01:01:39,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 838.88104 ± 237.605
2025-05-07 01:01:39,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [713.6574, 701.5278, 1499.5387, 966.92487, 785.8549, 637.0973, 835.84125, 701.5294, 717.85254, 828.9868]
2025-05-07 01:01:39,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [264.0, 268.0, 498.0, 366.0, 318.0, 252.0, 356.0, 297.0, 281.0, 344.0]
2025-05-07 01:01:39,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 41/100 (estimated time remaining: 13 hours, 41 minutes, 47 seconds)
2025-05-07 01:13:17,212 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 01:13:17,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 01:15:03,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 943.07159 ± 338.904
2025-05-07 01:15:03,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1265.5615, 572.9674, 808.684, 672.98236, 1463.2305, 1109.1582, 691.3079, 689.87024, 657.60956, 1499.3441]
2025-05-07 01:15:03,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [431.0, 222.0, 319.0, 257.0, 490.0, 418.0, 259.0, 279.0, 261.0, 511.0]
2025-05-07 01:15:03,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 42/100 (estimated time remaining: 13 hours, 19 minutes, 25 seconds)
2025-05-07 01:27:25,062 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 01:27:25,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 01:29:11,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 746.34949 ± 336.430
2025-05-07 01:29:11,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [770.53436, 643.1666, 644.6054, 22.134474, 603.3489, 1280.6945, 1114.0105, 664.8266, 1096.85, 623.32306]
2025-05-07 01:29:11,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [297.0, 261.0, 282.0, 31.0, 226.0, 443.0, 417.0, 265.0, 358.0, 229.0]
2025-05-07 01:29:11,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 43/100 (estimated time remaining: 13 hours, 11 minutes, 43 seconds)
2025-05-07 01:41:44,530 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 01:41:44,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 01:43:23,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 688.32666 ± 57.344
2025-05-07 01:43:23,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [640.3817, 812.73883, 720.8787, 649.8906, 666.2989, 634.90326, 610.7922, 738.4838, 714.0103, 694.8884]
2025-05-07 01:43:23,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [242.0, 303.0, 280.0, 242.0, 256.0, 236.0, 226.0, 284.0, 287.0, 278.0]
2025-05-07 01:43:23,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 44/100 (estimated time remaining: 13 hours, 42 seconds)
2025-05-07 01:55:50,016 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 01:55:50,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 01:57:15,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 735.71368 ± 220.335
2025-05-07 01:57:15,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [600.8018, 701.9905, 609.318, 647.2536, 637.6404, 712.7042, 609.3977, 772.2597, 687.72186, 1378.0494]
2025-05-07 01:57:15,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [223.0, 282.0, 230.0, 256.0, 249.0, 287.0, 229.0, 294.0, 275.0, 467.0]
2025-05-07 01:57:15,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 45/100 (estimated time remaining: 12 hours, 51 minutes, 43 seconds)
2025-05-07 02:08:29,763 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 02:08:30,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 02:10:39,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 945.79230 ± 239.851
2025-05-07 02:10:39,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [769.59827, 1035.8474, 737.8414, 820.5667, 1227.1918, 795.86664, 1358.0782, 697.6028, 746.60205, 1268.7277]
2025-05-07 02:10:39,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [291.0, 374.0, 284.0, 310.0, 451.0, 285.0, 449.0, 257.0, 290.0, 429.0]
2025-05-07 02:10:39,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 46/100 (estimated time remaining: 12 hours, 39 minutes, 6 seconds)
2025-05-07 02:23:07,584 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 02:23:07,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 02:25:01,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 783.97333 ± 161.534
2025-05-07 02:25:01,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [930.28424, 905.11926, 757.29785, 681.22504, 830.53125, 529.92236, 699.73456, 1124.2596, 633.35803, 748.0015]
2025-05-07 02:25:01,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [333.0, 359.0, 301.0, 274.0, 315.0, 205.0, 304.0, 403.0, 252.0, 313.0]
2025-05-07 02:25:01,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 47/100 (estimated time remaining: 12 hours, 35 minutes, 41 seconds)
2025-05-07 02:37:12,113 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 02:37:12,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 02:39:10,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1101.18506 ± 527.231
2025-05-07 02:39:10,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1062.1727, 620.33997, 812.63635, 1233.4921, 994.9303, 572.5385, 1881.4929, 645.0886, 942.01404, 2247.1453]
2025-05-07 02:39:10,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [383.0, 259.0, 286.0, 406.0, 352.0, 226.0, 553.0, 258.0, 341.0, 795.0]
2025-05-07 02:39:10,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 48/100 (estimated time remaining: 12 hours, 21 minutes, 53 seconds)
2025-05-07 02:51:41,075 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 02:51:41,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 02:54:06,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1032.73669 ± 103.737
2025-05-07 02:54:06,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1129.9506, 1013.1412, 1079.9226, 950.536, 951.25635, 834.78107, 1212.6091, 1100.1827, 972.6626, 1082.3258]
2025-05-07 02:54:06,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [383.0, 370.0, 381.0, 335.0, 364.0, 422.0, 420.0, 424.0, 346.0, 382.0]
2025-05-07 02:54:06,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 49/100 (estimated time remaining: 12 hours, 15 minutes, 24 seconds)
2025-05-07 03:06:31,991 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 03:06:31,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 03:08:33,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 853.71552 ± 355.709
2025-05-07 03:08:33,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [630.2382, 644.642, 699.76776, 621.43787, 1349.1582, 621.6858, 646.53986, 684.4887, 939.44836, 1699.7485]
2025-05-07 03:08:33,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [243.0, 269.0, 268.0, 235.0, 502.0, 252.0, 245.0, 262.0, 392.0, 531.0]
2025-05-07 03:08:33,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 50/100 (estimated time remaining: 12 hours, 7 minutes, 9 seconds)
2025-05-07 03:21:16,989 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 03:21:16,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 03:22:59,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 838.04163 ± 311.592
2025-05-07 03:22:59,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1229.9575, 709.27484, 702.0006, 863.06976, 600.6945, 1383.1951, 562.238, 1176.93, 800.542, 352.51382]
2025-05-07 03:22:59,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [435.0, 298.0, 267.0, 334.0, 234.0, 479.0, 205.0, 435.0, 315.0, 439.0]
2025-05-07 03:22:59,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 51/100 (estimated time remaining: 12 hours, 3 minutes, 17 seconds)
2025-05-07 03:35:43,208 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 03:35:43,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 03:37:53,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 952.32990 ± 402.632
2025-05-07 03:37:53,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [660.81, 36.278076, 1094.5753, 1230.9242, 671.23566, 758.5374, 1429.043, 1392.5278, 1157.3254, 1092.0417]
2025-05-07 03:37:53,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [244.0, 48.0, 421.0, 437.0, 253.0, 294.0, 472.0, 516.0, 388.0, 411.0]
2025-05-07 03:37:53,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 52/100 (estimated time remaining: 11 hours, 54 minutes, 6 seconds)
2025-05-07 03:50:47,662 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 03:50:47,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 03:52:52,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 939.66913 ± 286.566
2025-05-07 03:52:53,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [666.10803, 614.65, 1326.4214, 677.1092, 1225.2848, 823.2472, 770.583, 722.59216, 1364.0332, 1206.663]
2025-05-07 03:52:53,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [252.0, 234.0, 422.0, 280.0, 441.0, 270.0, 284.0, 255.0, 454.0, 424.0]
2025-05-07 03:52:53,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 53/100 (estimated time remaining: 11 hours, 47 minutes, 44 seconds)
2025-05-07 04:05:16,585 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 04:05:16,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 04:07:07,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1048.05566 ± 417.583
2025-05-07 04:07:07,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1055.0706, 723.8696, 686.17737, 1907.1967, 693.0884, 804.51825, 978.50183, 636.4796, 1549.7576, 1445.8967]
2025-05-07 04:07:07,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [397.0, 270.0, 258.0, 614.0, 265.0, 307.0, 331.0, 231.0, 486.0, 445.0]
2025-05-07 04:07:07,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 54/100 (estimated time remaining: 11 hours, 26 minutes, 22 seconds)
2025-05-07 04:19:29,846 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 04:19:30,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 04:21:18,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 778.39966 ± 130.161
2025-05-07 04:21:18,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [721.2656, 721.23944, 1002.58636, 685.9573, 722.3964, 1036.284, 661.3532, 820.7383, 771.2475, 640.92816]
2025-05-07 04:21:18,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [278.0, 282.0, 343.0, 248.0, 278.0, 357.0, 245.0, 323.0, 289.0, 247.0]
2025-05-07 04:21:18,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 55/100 (estimated time remaining: 11 hours, 9 minutes, 24 seconds)
2025-05-07 04:33:30,764 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 04:33:30,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 04:35:28,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 892.72815 ± 244.005
2025-05-07 04:35:28,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [707.2146, 600.9847, 922.4668, 1242.8397, 1054.3998, 629.24445, 1320.2687, 699.78845, 740.5189, 1009.5548]
2025-05-07 04:35:28,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [248.0, 239.0, 322.0, 441.0, 345.0, 225.0, 429.0, 243.0, 300.0, 338.0]
2025-05-07 04:35:28,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 56/100 (estimated time remaining: 10 hours, 52 minutes, 17 seconds)
2025-05-07 04:48:24,101 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 04:48:24,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 04:50:12,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 805.60645 ± 265.891
2025-05-07 04:50:12,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [658.4244, 836.3386, 696.37286, 1222.7513, 1092.2872, 605.0925, 1081.9164, 290.13202, 657.4631, 915.2864]
2025-05-07 04:50:12,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [239.0, 283.0, 270.0, 413.0, 384.0, 215.0, 335.0, 169.0, 244.0, 331.0]
2025-05-07 04:50:12,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 57/100 (estimated time remaining: 10 hours, 36 minutes, 24 seconds)
2025-05-07 05:02:04,580 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 05:02:04,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 05:03:39,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 922.26660 ± 248.004
2025-05-07 05:03:39,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [707.5599, 998.764, 742.95123, 1454.4214, 1042.3904, 755.6041, 675.0529, 1052.6809, 1147.7389, 645.5015]
2025-05-07 05:03:39,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [248.0, 332.0, 243.0, 447.0, 343.0, 260.0, 248.0, 346.0, 374.0, 244.0]
2025-05-07 05:03:39,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 58/100 (estimated time remaining: 10 hours, 8 minutes, 35 seconds)
2025-05-07 05:16:23,418 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 05:16:23,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 05:18:38,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1064.12012 ± 202.295
2025-05-07 05:18:38,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1294.1548, 1116.1418, 1112.675, 1084.6768, 1185.9487, 949.11334, 776.75476, 693.12213, 1044.8514, 1383.7621]
2025-05-07 05:18:38,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [456.0, 382.0, 370.0, 351.0, 401.0, 318.0, 284.0, 236.0, 359.0, 419.0]
2025-05-07 05:18:38,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 59/100 (estimated time remaining: 10 hours, 45 seconds)
2025-05-07 05:31:16,196 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 05:31:16,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 05:33:49,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1257.38501 ± 185.893
2025-05-07 05:33:51,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1217.2296, 1435.1428, 894.2915, 1476.5243, 1253.5593, 1494.5188, 1049.2957, 1393.1329, 1215.8906, 1144.2639]
2025-05-07 05:33:51,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [407.0, 471.0, 285.0, 478.0, 397.0, 470.0, 348.0, 438.0, 406.0, 372.0]
2025-05-07 05:33:51,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 60/100 (estimated time remaining: 9 hours, 54 minutes, 50 seconds)
2025-05-07 05:45:26,523 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 05:45:26,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 05:47:40,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1127.40198 ± 410.368
2025-05-07 05:47:40,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1306.6299, 1090.2949, 1198.5366, 22.796518, 946.96686, 1597.7524, 1516.6001, 1242.468, 1143.745, 1208.23]
2025-05-07 05:47:40,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [411.0, 350.0, 368.0, 32.0, 310.0, 492.0, 449.0, 387.0, 364.0, 394.0]
2025-05-07 05:47:40,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 61/100 (estimated time remaining: 9 hours, 37 minutes, 39 seconds)
2025-05-07 05:59:24,319 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 05:59:24,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 06:01:38,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1376.57251 ± 342.628
2025-05-07 06:01:38,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1939.8839, 1648.6849, 1376.0273, 1109.9347, 1398.5558, 1195.0016, 661.58673, 1478.0343, 1739.334, 1218.6823]
2025-05-07 06:01:38,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [596.0, 508.0, 444.0, 365.0, 466.0, 412.0, 211.0, 464.0, 528.0, 416.0]
2025-05-07 06:01:38,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 62/100 (estimated time remaining: 9 hours, 17 minutes, 6 seconds)
2025-05-07 06:13:50,395 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 06:13:50,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 06:15:53,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1300.02466 ± 485.484
2025-05-07 06:15:53,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1440.6447, 1461.908, 1659.4435, 798.4309, 1454.9048, 800.8937, 1573.7467, 1895.7429, 1661.6445, 252.88712]
2025-05-07 06:15:53,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [436.0, 443.0, 513.0, 262.0, 440.0, 259.0, 497.0, 573.0, 514.0, 115.0]
2025-05-07 06:15:53,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 63/100 (estimated time remaining: 9 hours, 8 minutes, 52 seconds)
2025-05-07 06:27:17,322 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 06:27:17,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 06:29:22,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1013.17078 ± 154.230
2025-05-07 06:29:22,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1087.7845, 904.802, 1028.5411, 1302.2068, 1266.1334, 872.3412, 898.9163, 945.72015, 999.627, 825.63574]
2025-05-07 06:29:22,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [356.0, 307.0, 324.0, 395.0, 420.0, 301.0, 290.0, 322.0, 334.0, 282.0]
2025-05-07 06:29:22,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 64/100 (estimated time remaining: 8 hours, 43 minutes, 24 seconds)
2025-05-07 06:40:36,530 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 06:40:36,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 06:42:29,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1120.90857 ± 182.515
2025-05-07 06:42:30,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1133.9252, 1182.9556, 1057.0521, 1349.1965, 1095.5021, 1136.768, 1310.2732, 1316.3628, 880.5229, 746.52795]
2025-05-07 06:42:30,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [380.0, 401.0, 336.0, 424.0, 382.0, 379.0, 429.0, 442.0, 319.0, 302.0]
2025-05-07 06:42:30,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 65/100 (estimated time remaining: 8 hours, 14 minutes, 17 seconds)
2025-05-07 06:54:45,944 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 06:54:45,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 06:57:04,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1462.11670 ± 331.868
2025-05-07 06:57:04,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [970.9413, 1168.8069, 1182.309, 2118.6633, 1855.341, 1512.0497, 1306.6703, 1694.5469, 1513.4182, 1298.4207]
2025-05-07 06:57:04,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [325.0, 365.0, 357.0, 619.0, 557.0, 464.0, 423.0, 511.0, 459.0, 421.0]
2025-05-07 06:57:04,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 66/100 (estimated time remaining: 8 hours, 5 minutes, 49 seconds)
2025-05-07 07:09:37,310 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 07:09:37,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 07:12:07,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1236.89905 ± 175.402
2025-05-07 07:12:07,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1078.7178, 1175.2284, 1358.6067, 1089.805, 1122.6245, 1554.1688, 1036.1812, 1521.6305, 1159.2932, 1272.7349]
2025-05-07 07:12:07,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [355.0, 387.0, 409.0, 370.0, 373.0, 486.0, 342.0, 468.0, 367.0, 394.0]
2025-05-07 07:12:07,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 59 minutes, 15 seconds)
2025-05-07 07:23:34,732 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 07:23:34,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 07:26:17,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1352.59155 ± 425.272
2025-05-07 07:26:17,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1412.9679, 1786.9989, 1220.8427, 1107.241, 678.50336, 733.4443, 1962.0154, 1633.3472, 1171.2631, 1819.2935]
2025-05-07 07:26:17,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [447.0, 553.0, 406.0, 350.0, 269.0, 274.0, 594.0, 494.0, 398.0, 569.0]
2025-05-07 07:26:17,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 68/100 (estimated time remaining: 7 hours, 44 minutes, 41 seconds)
2025-05-07 07:38:11,944 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 07:38:12,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 07:40:47,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1294.49927 ± 261.020
2025-05-07 07:40:47,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1748.443, 1157.658, 1341.6472, 1321.49, 1076.2517, 1487.8173, 835.64496, 1234.5739, 1103.5178, 1637.9502]
2025-05-07 07:40:47,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [523.0, 375.0, 433.0, 435.0, 357.0, 468.0, 290.0, 388.0, 363.0, 505.0]
2025-05-07 07:40:47,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 69/100 (estimated time remaining: 7 hours, 37 minutes, 1 second)
2025-05-07 07:52:07,032 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 07:52:07,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 07:54:57,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1470.03503 ± 270.266
2025-05-07 07:54:57,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1291.1454, 1330.8408, 2208.4983, 1297.7052, 1312.6605, 1592.918, 1345.0226, 1547.7141, 1506.689, 1267.1552]
2025-05-07 07:54:57,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [392.0, 411.0, 667.0, 397.0, 403.0, 480.0, 425.0, 480.0, 469.0, 405.0]
2025-05-07 07:54:57,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (1470.04) for latency ExtremeSparseL4U32
2025-05-07 07:54:57,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-07 07:54:57,745 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 07:54:57,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 70/100 (estimated time remaining: 7 hours, 29 minutes, 13 seconds)
2025-05-07 08:08:07,478 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 08:08:07,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 08:10:20,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1362.01794 ± 225.906
2025-05-07 08:10:20,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1686.1515, 1105.6549, 1286.6277, 1237.0294, 1602.6342, 1447.5435, 1018.3325, 1436.3475, 1651.441, 1148.416]
2025-05-07 08:10:20,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [525.0, 360.0, 393.0, 399.0, 499.0, 456.0, 345.0, 466.0, 527.0, 368.0]
2025-05-07 08:10:20,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 19 minutes, 35 seconds)
2025-05-07 08:22:42,465 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 08:22:42,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 08:25:42,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1461.90112 ± 286.746
2025-05-07 08:25:42,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1441.313, 1471.623, 1451.606, 1595.7087, 1460.0038, 1649.7479, 684.9728, 1378.4352, 1746.1356, 1739.4648]
2025-05-07 08:25:42,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [464.0, 455.0, 463.0, 528.0, 489.0, 539.0, 239.0, 463.0, 584.0, 565.0]
2025-05-07 08:25:42,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 6 minutes, 47 seconds)
2025-05-07 08:37:53,801 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 08:37:53,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 08:40:37,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1363.43201 ± 53.687
2025-05-07 08:40:37,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1396.0098, 1282.6603, 1361.2247, 1344.3528, 1303.167, 1350.7031, 1485.0272, 1340.3346, 1405.2861, 1365.5544]
2025-05-07 08:40:37,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [467.0, 401.0, 451.0, 428.0, 421.0, 422.0, 452.0, 424.0, 461.0, 444.0]
2025-05-07 08:40:37,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 56 minutes, 14 seconds)
2025-05-07 08:53:38,117 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 08:53:38,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 08:56:23,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1342.78528 ± 343.457
2025-05-07 08:56:23,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1171.5518, 1422.1152, 1351.534, 1225.5876, 750.563, 1339.3306, 985.22437, 2061.9043, 1701.614, 1418.4274]
2025-05-07 08:56:23,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [393.0, 477.0, 414.0, 406.0, 264.0, 434.0, 338.0, 637.0, 569.0, 455.0]
2025-05-07 08:56:23,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 48 minutes, 15 seconds)
2025-05-07 09:08:11,040 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 09:08:11,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 09:10:34,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1152.71448 ± 494.904
2025-05-07 09:10:34,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1342.2428, 1135.1991, 1849.0118, 741.4524, 11.220381, 1251.6892, 1061.5258, 1801.065, 1259.6823, 1074.0553]
2025-05-07 09:10:34,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [414.0, 364.0, 589.0, 278.0, 23.0, 431.0, 362.0, 555.0, 416.0, 355.0]
2025-05-07 09:10:34,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 33 minutes, 11 seconds)
2025-05-07 09:21:59,142 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 09:21:59,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 09:24:32,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1290.17444 ± 391.699
2025-05-07 09:24:32,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [419.3584, 1305.1312, 1117.5233, 1115.2473, 1915.7616, 1162.3735, 1299.5524, 1774.0342, 1555.8679, 1236.8938]
2025-05-07 09:24:32,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [141.0, 429.0, 347.0, 369.0, 580.0, 361.0, 400.0, 552.0, 478.0, 406.0]
2025-05-07 09:24:32,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 10 minutes, 57 seconds)
2025-05-07 09:37:20,402 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 09:37:20,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 09:39:52,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1596.42163 ± 360.603
2025-05-07 09:39:52,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [2409.866, 1298.3535, 1316.2208, 1443.2933, 1385.7797, 1637.9946, 1529.7961, 2014.4275, 1157.9398, 1770.5447]
2025-05-07 09:39:52,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [719.0, 404.0, 394.0, 472.0, 446.0, 485.0, 495.0, 629.0, 385.0, 537.0]
2025-05-07 09:39:52,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (1596.42) for latency ExtremeSparseL4U32
2025-05-07 09:39:52,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-07 09:39:52,665 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 09:39:52,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 56 minutes, 2 seconds)
2025-05-07 09:52:06,236 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 09:52:06,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 09:55:14,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1637.95605 ± 242.827
2025-05-07 09:55:14,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1686.4636, 1337.797, 1566.6864, 2008.5262, 1451.5802, 1760.3644, 1607.4105, 1453.5839, 1404.7039, 2102.444]
2025-05-07 09:55:14,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [519.0, 410.0, 473.0, 616.0, 439.0, 526.0, 520.0, 425.0, 437.0, 637.0]
2025-05-07 09:55:14,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (1637.96) for latency ExtremeSparseL4U32
2025-05-07 09:55:14,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-07 09:55:14,423 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 09:55:14,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 43 minutes, 14 seconds)
2025-05-07 10:07:37,080 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 10:07:37,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 10:09:43,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1298.58936 ± 234.232
2025-05-07 10:09:43,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1481.0331, 1434.0714, 1206.1049, 1332.9528, 1612.9972, 1183.5779, 863.92505, 1137.1976, 1092.3855, 1641.6484]
2025-05-07 10:09:43,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [463.0, 446.0, 372.0, 417.0, 483.0, 370.0, 298.0, 386.0, 381.0, 501.0]
2025-05-07 10:09:43,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 22 minutes, 39 seconds)
2025-05-07 10:21:57,305 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 10:21:57,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 10:24:03,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1331.32739 ± 372.234
2025-05-07 10:24:03,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1856.4623, 1200.6083, 1119.165, 1422.1501, 1179.6223, 1296.144, 1599.9819, 481.71768, 1356.4485, 1800.9738]
2025-05-07 10:24:03,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [557.0, 379.0, 349.0, 454.0, 378.0, 428.0, 487.0, 168.0, 429.0, 557.0]
2025-05-07 10:24:03,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 8 minutes, 36 seconds)
2025-05-07 10:36:49,606 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 10:36:49,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 10:39:36,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1410.64038 ± 330.601
2025-05-07 10:39:36,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1476.3888, 731.5791, 1450.0896, 1643.4803, 2046.4001, 1287.3254, 1299.7415, 1382.0972, 1658.0236, 1131.2798]
2025-05-07 10:39:36,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [448.0, 264.0, 432.0, 520.0, 642.0, 432.0, 424.0, 409.0, 522.0, 358.0]
2025-05-07 10:39:36,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 16 seconds)
2025-05-07 10:52:36,161 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 10:52:36,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 10:55:20,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1420.11707 ± 204.286
2025-05-07 10:55:22,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1592.5294, 1524.6943, 1414.3062, 1339.8917, 1551.1417, 1463.7789, 1106.8477, 1812.2015, 1182.3137, 1213.4664]
2025-05-07 10:55:22,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [503.0, 456.0, 423.0, 418.0, 494.0, 441.0, 342.0, 553.0, 378.0, 378.0]
2025-05-07 10:55:22,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 46 minutes, 51 seconds)
2025-05-07 11:07:08,008 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 11:07:08,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:09:55,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1410.62524 ± 376.921
2025-05-07 11:09:55,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1772.4377, 1103.9613, 1092.6158, 2147.0706, 1498.4952, 961.5175, 1338.5846, 1849.8696, 1292.771, 1048.9291]
2025-05-07 11:09:55,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [531.0, 349.0, 349.0, 701.0, 466.0, 323.0, 448.0, 568.0, 411.0, 357.0]
2025-05-07 11:09:55,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 28 minutes, 51 seconds)
2025-05-07 11:22:52,533 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 11:22:52,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:25:54,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1573.28088 ± 284.251
2025-05-07 11:25:54,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1678.7611, 1488.8157, 1128.6774, 1458.9916, 1835.8688, 1475.2242, 1504.6802, 1776.6814, 2154.0151, 1231.0939]
2025-05-07 11:25:54,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [519.0, 464.0, 365.0, 447.0, 557.0, 461.0, 455.0, 531.0, 653.0, 408.0]
2025-05-07 11:25:54,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 19 minutes, 1 second)
2025-05-07 11:38:33,196 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 11:38:33,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:41:22,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1433.61194 ± 295.278
2025-05-07 11:41:22,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1320.0153, 2011.3889, 1362.6821, 1982.5547, 1326.1571, 1288.4514, 1269.4349, 1342.0509, 1385.1031, 1048.28]
2025-05-07 11:41:22,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [411.0, 629.0, 446.0, 637.0, 409.0, 409.0, 423.0, 429.0, 434.0, 346.0]
2025-05-07 11:41:22,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 7 minutes, 24 seconds)
2025-05-07 11:53:41,401 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 11:53:41,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:56:12,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1283.83032 ± 519.892
2025-05-07 11:56:12,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1395.3284, 1762.4391, 1680.3206, 21.170084, 1153.6196, 1675.0074, 1488.8324, 1656.9319, 685.9673, 1318.6863]
2025-05-07 11:56:12,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [435.0, 537.0, 504.0, 33.0, 376.0, 517.0, 447.0, 521.0, 243.0, 440.0]
2025-05-07 11:56:12,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 49 minutes, 48 seconds)
2025-05-07 12:09:28,783 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 12:09:28,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:11:34,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1309.71509 ± 172.653
2025-05-07 12:11:34,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1225.9696, 1331.3639, 1399.6317, 1323.6832, 1386.2574, 1038.643, 1662.9016, 1450.5304, 1129.2769, 1148.8939]
2025-05-07 12:11:34,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [371.0, 426.0, 440.0, 408.0, 427.0, 349.0, 550.0, 459.0, 363.0, 404.0]
2025-05-07 12:11:34,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 33 minutes, 22 seconds)
2025-05-07 12:21:50,995 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 12:21:50,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:24:30,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1707.69861 ± 445.584
2025-05-07 12:24:30,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [2148.0972, 2742.3367, 1605.4015, 1846.097, 1299.7305, 1619.3728, 1703.8387, 1112.0953, 1695.1205, 1304.8969]
2025-05-07 12:24:30,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [670.0, 812.0, 551.0, 558.0, 394.0, 487.0, 535.0, 366.0, 529.0, 403.0]
2025-05-07 12:24:30,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (1707.70) for latency ExtremeSparseL4U32
2025-05-07 12:24:30,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-07 12:24:30,804 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 12:24:30,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 13 minutes, 56 seconds)
2025-05-07 12:34:21,340 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 12:34:21,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:36:40,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1467.36169 ± 341.162
2025-05-07 12:36:40,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1714.6385, 1669.1553, 1417.9423, 1273.7878, 2152.1443, 1614.8, 938.0692, 1509.0278, 970.7037, 1413.3478]
2025-05-07 12:36:40,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [519.0, 519.0, 429.0, 392.0, 652.0, 508.0, 316.0, 463.0, 329.0, 439.0]
2025-05-07 12:36:40,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 49 minutes, 49 seconds)
2025-05-07 12:48:18,312 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 12:48:18,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:51:26,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1614.47656 ± 356.844
2025-05-07 12:51:27,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1516.9596, 1481.6125, 1809.2976, 1916.3783, 1135.4857, 1284.947, 1516.7358, 2428.84, 1740.718, 1313.7913]
2025-05-07 12:51:27,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [490.0, 446.0, 588.0, 565.0, 360.0, 398.0, 492.0, 743.0, 562.0, 408.0]
2025-05-07 12:51:27,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 34 minutes, 11 seconds)
2025-05-07 13:02:58,760 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 13:02:58,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:04:55,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1263.62817 ± 241.120
2025-05-07 13:04:55,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [869.8675, 1000.2122, 1175.0406, 1341.3438, 1434.0586, 1246.8661, 1418.951, 1777.086, 1102.5956, 1270.2609]
2025-05-07 13:04:55,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [292.0, 322.0, 364.0, 407.0, 418.0, 382.0, 424.0, 516.0, 360.0, 415.0]
2025-05-07 13:04:55,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 17 minutes, 25 seconds)
2025-05-07 13:15:45,772 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 13:15:45,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:18:37,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1840.57812 ± 386.784
2025-05-07 13:18:37,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1647.8727, 1621.0093, 2323.4275, 2634.1814, 2057.5095, 1456.8231, 1889.9951, 1338.5762, 1532.4658, 1903.9226]
2025-05-07 13:18:38,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [510.0, 490.0, 718.0, 760.0, 626.0, 438.0, 601.0, 448.0, 456.0, 560.0]
2025-05-07 13:18:38,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1124 [INFO]: New best (1840.58) for latency ExtremeSparseL4U32
2025-05-07 13:18:39,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1127 [INFO]: saving network
2025-05-07 13:18:39,983 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-walker2d/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 13:18:40,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 47 seconds)
2025-05-07 13:30:38,938 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 13:30:38,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:32:37,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1276.71460 ± 175.027
2025-05-07 13:32:37,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1661.6091, 1125.8281, 1155.8639, 1065.4678, 1284.4857, 1159.8441, 1192.7083, 1507.2175, 1291.98, 1322.1425]
2025-05-07 13:32:37,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [487.0, 341.0, 356.0, 345.0, 373.0, 359.0, 381.0, 464.0, 425.0, 404.0]
2025-05-07 13:32:37,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 48 minutes, 58 seconds)
2025-05-07 13:44:27,418 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 13:44:27,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:47:02,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1329.78296 ± 305.257
2025-05-07 13:47:03,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1745.1133, 1799.5223, 1298.6587, 794.39923, 1001.26166, 1202.2899, 1493.8466, 1287.5547, 1559.0157, 1116.1681]
2025-05-07 13:47:03,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [523.0, 531.0, 418.0, 288.0, 333.0, 377.0, 462.0, 390.0, 486.0, 352.0]
2025-05-07 13:47:04,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 38 minutes, 34 seconds)
2025-05-07 13:58:52,218 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 13:58:52,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:01:17,677 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1536.08728 ± 402.229
2025-05-07 14:01:17,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1612.933, 904.2625, 1370.6906, 2229.69, 967.08716, 1905.8438, 1913.6531, 1585.2577, 1634.1434, 1237.311]
2025-05-07 14:01:17,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [518.0, 321.0, 436.0, 661.0, 317.0, 593.0, 616.0, 476.0, 504.0, 408.0]
2025-05-07 14:01:17,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 23 minutes, 48 seconds)
2025-05-07 14:11:48,544 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 14:11:48,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:15:00,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1628.83923 ± 419.215
2025-05-07 14:15:00,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1294.4418, 1303.9485, 1878.4126, 1337.4159, 1870.513, 2594.1965, 1432.126, 1264.952, 1317.1521, 1995.2349]
2025-05-07 14:15:00,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [388.0, 417.0, 602.0, 408.0, 604.0, 830.0, 478.0, 380.0, 429.0, 612.0]
2025-05-07 14:15:00,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 10 minutes, 5 seconds)
2025-05-07 14:27:35,525 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 14:27:35,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:30:50,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1678.53540 ± 492.914
2025-05-07 14:30:51,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [2725.3933, 1553.2383, 2166.7258, 1249.8944, 1073.5183, 1183.7003, 2150.911, 1471.7158, 1540.453, 1669.8024]
2025-05-07 14:30:51,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [824.0, 460.0, 658.0, 387.0, 340.0, 414.0, 655.0, 433.0, 455.0, 519.0]
2025-05-07 14:30:52,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 97/100 (estimated time remaining: 57 minutes, 44 seconds)
2025-05-07 14:43:55,863 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 14:43:55,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:47:24,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1783.45740 ± 501.837
2025-05-07 14:47:25,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [2652.0935, 966.26086, 1571.8854, 2038.3981, 1897.1718, 2350.976, 1796.686, 2024.6875, 1092.1813, 1444.2338]
2025-05-07 14:47:25,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [827.0, 300.0, 503.0, 615.0, 580.0, 723.0, 565.0, 626.0, 347.0, 446.0]
2025-05-07 14:47:25,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 98/100 (estimated time remaining: 44 minutes, 52 seconds)
2025-05-07 15:00:09,955 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 15:00:09,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:03:08,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1569.97693 ± 398.895
2025-05-07 15:03:08,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1505.0908, 1582.7217, 1213.6224, 1303.7935, 1276.0359, 2408.02, 1492.169, 1928.4626, 1971.5881, 1018.2661]
2025-05-07 15:03:08,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [444.0, 487.0, 413.0, 398.0, 383.0, 707.0, 456.0, 568.0, 583.0, 306.0]
2025-05-07 15:03:08,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 99/100 (estimated time remaining: 30 minutes, 25 seconds)
2025-05-07 15:15:39,374 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 15:15:39,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:18:45,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1575.13208 ± 392.081
2025-05-07 15:18:45,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1861.8528, 1502.025, 1524.6475, 1709.4075, 2070.8193, 568.96277, 1318.4534, 1607.0975, 1729.4978, 1858.5562]
2025-05-07 15:18:45,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [589.0, 456.0, 505.0, 525.0, 624.0, 196.0, 403.0, 514.0, 527.0, 599.0]
2025-05-07 15:18:45,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1097 [INFO]: Iteration 100/100 (estimated time remaining: 15 minutes, 29 seconds)
2025-05-07 15:29:39,700 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 15:29:39,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:31:47,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1119 [DEBUG]: Total Reward: 1332.49231 ± 327.235
2025-05-07 15:31:47,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1120 [DEBUG]: All rewards: [1293.3785, 1341.0552, 1364.0168, 1358.1417, 2149.6257, 1136.4983, 1240.4062, 775.0779, 1478.7217, 1187.9996]
2025-05-07 15:31:47,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1121 [DEBUG]: All trajectory lengths: [410.0, 408.0, 422.0, 447.0, 663.0, 350.0, 386.0, 287.0, 446.0, 402.0]
2025-05-07 15:31:47,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-walker2d):1149 [DEBUG]: Training session finished
