2025-05-06 15:36:40,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-mbpac_memdelay
2025-05-06 15:36:40,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-mbpac_memdelay
2025-05-06 15:36:40,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1008 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1494e64c4590>}
2025-05-06 15:36:40,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1009 [DEBUG]: using device: cuda
2025-05-06 15:36:40,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1031 [INFO]: Creating new trainer
2025-05-06 15:36:40,163 baseline-mbpac-noisy-hopper:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-05-06 15:36:40,163 baseline-mbpac-noisy-hopper:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-06 15:36:40,169 baseline-mbpac-noisy-hopper:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=3, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-05-06 15:36:41,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1092 [DEBUG]: Starting training session...
2025-05-06 15:36:41,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 1/100
2025-05-06 15:47:08,548 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 15:47:08,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 15:47:25,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 98.56706 ± 28.933
2025-05-06 15:47:25,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [115.649765, 69.65139, 101.77537, 150.21584, 97.75366, 99.73898, 132.05486, 46.72327, 71.20296, 100.90452]
2025-05-06 15:47:25,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [63.0, 43.0, 58.0, 74.0, 56.0, 56.0, 68.0, 27.0, 54.0, 57.0]
2025-05-06 15:47:25,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (98.57) for latency ExtremeSparseL4U32
2025-05-06 15:47:25,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-06 15:47:25,515 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 15:47:26,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 2/100 (estimated time remaining: 17 hours, 44 minutes, 22 seconds)
2025-05-06 15:57:51,125 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 15:57:51,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 15:58:18,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 173.34712 ± 41.486
2025-05-06 15:58:18,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [282.9146, 156.28983, 148.70013, 132.0032, 157.03778, 173.94466, 141.14845, 207.92598, 169.20752, 164.29912]
2025-05-06 15:58:18,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [133.0, 79.0, 82.0, 73.0, 80.0, 92.0, 76.0, 97.0, 105.0, 86.0]
2025-05-06 15:58:18,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (173.35) for latency ExtremeSparseL4U32
2025-05-06 15:58:18,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-06 15:58:18,045 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 15:58:18,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 3/100 (estimated time remaining: 17 hours, 39 minutes, 36 seconds)
2025-05-06 16:09:08,819 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 16:09:09,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 16:09:42,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 236.61031 ± 87.368
2025-05-06 16:09:42,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [299.67688, 184.83946, 198.43015, 105.22814, 141.96019, 224.03369, 349.42444, 168.8996, 357.78326, 335.82712]
2025-05-06 16:09:42,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [124.0, 104.0, 92.0, 58.0, 70.0, 99.0, 137.0, 79.0, 165.0, 134.0]
2025-05-06 16:09:42,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (236.61) for latency ExtremeSparseL4U32
2025-05-06 16:09:42,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-06 16:09:42,382 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 16:09:42,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 4/100 (estimated time remaining: 17 hours, 47 minutes, 44 seconds)
2025-05-06 16:20:13,497 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 16:20:13,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 16:20:44,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 221.32214 ± 42.373
2025-05-06 16:20:44,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [175.68404, 263.90793, 197.33556, 286.95306, 259.22894, 175.85805, 213.7078, 153.76755, 249.56165, 237.21684]
2025-05-06 16:20:44,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [81.0, 114.0, 94.0, 128.0, 117.0, 85.0, 97.0, 76.0, 113.0, 107.0]
2025-05-06 16:20:45,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 5/100 (estimated time remaining: 17 hours, 37 minutes, 56 seconds)
2025-05-06 16:31:18,784 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 16:31:18,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 16:31:49,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 172.51877 ± 89.430
2025-05-06 16:31:49,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [168.0609, 285.02954, 148.09892, 356.83932, 152.22066, 239.10359, 139.85468, 90.7599, 63.464092, 81.756035]
2025-05-06 16:31:49,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [92.0, 178.0, 88.0, 171.0, 102.0, 126.0, 77.0, 70.0, 39.0, 48.0]
2025-05-06 16:31:49,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 6/100 (estimated time remaining: 17 hours, 27 minutes, 32 seconds)
2025-05-06 16:43:16,128 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 16:43:16,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 16:43:45,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 201.42166 ± 69.784
2025-05-06 16:43:45,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [264.39703, 146.72215, 106.66404, 168.12444, 261.79172, 168.18213, 262.64044, 253.84523, 291.39145, 90.45805]
2025-05-06 16:43:45,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [125.0, 76.0, 64.0, 86.0, 115.0, 92.0, 111.0, 112.0, 120.0, 54.0]
2025-05-06 16:43:45,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 7/100 (estimated time remaining: 17 hours, 38 minutes, 50 seconds)
2025-05-06 16:54:08,384 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 16:54:09,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 16:54:37,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 188.32378 ± 62.096
2025-05-06 16:54:37,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [174.34274, 153.86823, 165.14691, 99.718796, 182.83293, 120.87288, 290.95612, 300.9515, 217.64612, 176.9016]
2025-05-06 16:54:37,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [83.0, 80.0, 81.0, 56.0, 88.0, 66.0, 129.0, 131.0, 131.0, 86.0]
2025-05-06 16:54:37,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 8/100 (estimated time remaining: 17 hours, 27 minutes, 21 seconds)
2025-05-06 17:04:58,357 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 17:04:58,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 17:05:30,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 240.58115 ± 113.106
2025-05-06 17:05:30,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [178.18967, 500.7403, 383.60464, 188.18974, 236.53899, 78.748505, 236.39276, 237.6609, 194.31854, 171.4275]
2025-05-06 17:05:30,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [84.0, 189.0, 142.0, 85.0, 98.0, 49.0, 108.0, 124.0, 92.0, 82.0]
2025-05-06 17:05:30,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (240.58) for latency ExtremeSparseL4U32
2025-05-06 17:05:30,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-06 17:05:30,746 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 17:05:32,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 9/100 (estimated time remaining: 17 hours, 7 minutes, 15 seconds)
2025-05-06 17:15:48,420 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 17:15:48,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 17:16:21,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 242.69023 ± 135.864
2025-05-06 17:16:21,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [230.94319, 543.7011, 107.28508, 267.41544, 381.2367, 100.195885, 297.7007, 71.46129, 187.53363, 239.42957]
2025-05-06 17:16:21,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [114.0, 193.0, 60.0, 110.0, 161.0, 66.0, 124.0, 62.0, 105.0, 109.0]
2025-05-06 17:16:21,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (242.69) for latency ExtremeSparseL4U32
2025-05-06 17:16:21,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-06 17:16:21,310 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 17:16:21,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 10/100 (estimated time remaining: 16 hours, 51 minutes, 44 seconds)
2025-05-06 17:26:37,901 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 17:26:37,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 17:27:18,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 354.71426 ± 163.951
2025-05-06 17:27:18,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [228.3974, 236.64447, 404.0481, 461.3493, 602.37354, 185.75821, 423.13986, 628.79333, 192.8591, 183.7793]
2025-05-06 17:27:18,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [99.0, 119.0, 145.0, 153.0, 205.0, 87.0, 143.0, 238.0, 88.0, 84.0]
2025-05-06 17:27:18,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (354.71) for latency ExtremeSparseL4U32
2025-05-06 17:27:18,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-06 17:27:18,748 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 17:27:18,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 11/100 (estimated time remaining: 16 hours, 38 minutes, 54 seconds)
2025-05-06 17:37:40,548 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 17:37:40,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 17:38:11,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 215.82449 ± 130.948
2025-05-06 17:38:11,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [96.52314, 83.36813, 487.70862, 432.2927, 252.30258, 121.92267, 182.09206, 177.36745, 147.70137, 176.96617]
2025-05-06 17:38:11,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [72.0, 62.0, 194.0, 147.0, 105.0, 63.0, 95.0, 81.0, 91.0, 105.0]
2025-05-06 17:38:11,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 12/100 (estimated time remaining: 16 hours, 8 minutes, 59 seconds)
2025-05-06 17:48:39,530 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 17:48:39,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 17:49:19,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 360.05222 ± 173.562
2025-05-06 17:49:19,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [536.281, 598.27985, 384.99457, 440.53094, 313.02625, 144.16663, 240.92851, 162.53032, 163.47775, 616.3064]
2025-05-06 17:49:19,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [179.0, 196.0, 140.0, 148.0, 121.0, 72.0, 103.0, 81.0, 79.0, 194.0]
2025-05-06 17:49:19,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (360.05) for latency ExtremeSparseL4U32
2025-05-06 17:49:19,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-06 17:49:19,244 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 17:49:19,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 13/100 (estimated time remaining: 16 hours, 2 minutes, 45 seconds)
2025-05-06 17:59:34,513 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 17:59:34,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 18:00:19,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 415.82047 ± 150.433
2025-05-06 18:00:19,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [401.51324, 237.37115, 419.62134, 317.45135, 482.21436, 532.91754, 311.95807, 632.8376, 178.22961, 644.09033]
2025-05-06 18:00:19,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [187.0, 108.0, 183.0, 119.0, 158.0, 174.0, 125.0, 196.0, 80.0, 193.0]
2025-05-06 18:00:19,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (415.82) for latency ExtremeSparseL4U32
2025-05-06 18:00:19,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-06 18:00:19,835 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 18:00:19,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 14/100 (estimated time remaining: 15 hours, 53 minutes, 25 seconds)
2025-05-06 18:10:35,752 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 18:10:35,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 18:11:15,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 363.71512 ± 136.664
2025-05-06 18:11:15,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [84.633896, 372.32303, 215.11241, 526.49994, 255.50403, 479.4288, 446.3803, 311.08728, 501.36188, 444.81955]
2025-05-06 18:11:15,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [57.0, 137.0, 90.0, 168.0, 104.0, 175.0, 156.0, 132.0, 163.0, 154.0]
2025-05-06 18:11:15,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 15/100 (estimated time remaining: 15 hours, 44 minutes, 28 seconds)
2025-05-06 18:21:33,834 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 18:21:34,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 18:22:16,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 329.51471 ± 117.331
2025-05-06 18:22:16,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [229.91327, 331.67038, 114.51106, 228.50096, 314.3195, 325.4528, 565.7347, 420.98502, 355.765, 408.29462]
2025-05-06 18:22:16,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [109.0, 134.0, 62.0, 115.0, 132.0, 157.0, 216.0, 170.0, 130.0, 155.0]
2025-05-06 18:22:16,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 16/100 (estimated time remaining: 15 hours, 34 minutes, 28 seconds)
2025-05-06 18:32:44,049 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 18:32:44,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 18:33:30,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 373.84140 ± 202.322
2025-05-06 18:33:30,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [255.57613, 385.5184, 375.55954, 433.73935, 219.73193, 111.951416, 890.89, 316.96445, 494.06897, 254.41396]
2025-05-06 18:33:30,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [111.0, 144.0, 182.0, 159.0, 96.0, 61.0, 386.0, 144.0, 166.0, 107.0]
2025-05-06 18:33:30,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 17/100 (estimated time remaining: 15 hours, 29 minutes, 19 seconds)
2025-05-06 18:43:48,862 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 18:43:49,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 18:44:34,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 388.58072 ± 135.857
2025-05-06 18:44:34,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [290.77042, 424.65292, 294.53378, 466.4541, 297.72012, 743.0644, 360.95718, 352.8744, 415.00543, 239.7745]
2025-05-06 18:44:34,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [119.0, 149.0, 128.0, 162.0, 119.0, 271.0, 139.0, 137.0, 159.0, 106.0]
2025-05-06 18:44:35,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 18/100 (estimated time remaining: 15 hours, 17 minutes, 22 seconds)
2025-05-06 18:55:01,789 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 18:55:02,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 18:55:46,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 333.89249 ± 168.514
2025-05-06 18:55:46,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [265.17932, 76.07259, 476.4316, 161.87747, 438.291, 295.3825, 103.56364, 560.9621, 518.2728, 442.8915]
2025-05-06 18:55:46,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [126.0, 64.0, 219.0, 80.0, 183.0, 136.0, 57.0, 195.0, 224.0, 167.0]
2025-05-06 18:55:48,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 19/100 (estimated time remaining: 15 hours, 9 minutes, 42 seconds)
2025-05-06 19:06:00,330 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 19:06:00,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 19:06:43,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 356.02347 ± 93.255
2025-05-06 19:06:43,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [475.12094, 285.14465, 236.67348, 433.71277, 470.02832, 379.03275, 241.71492, 246.8433, 449.69516, 342.26865]
2025-05-06 19:06:43,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [170.0, 118.0, 104.0, 161.0, 175.0, 171.0, 104.0, 107.0, 164.0, 130.0]
2025-05-06 19:06:45,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 20/100 (estimated time remaining: 14 hours, 58 minutes, 54 seconds)
2025-05-06 19:17:10,416 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 19:17:10,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 19:17:44,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 260.44193 ± 97.158
2025-05-06 19:17:44,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [140.36128, 317.4675, 154.6717, 357.64966, 196.65959, 290.49554, 130.1319, 246.78163, 422.98297, 347.2176]
2025-05-06 19:17:44,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [88.0, 141.0, 74.0, 160.0, 87.0, 127.0, 70.0, 106.0, 158.0, 128.0]
2025-05-06 19:17:44,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 21/100 (estimated time remaining: 14 hours, 47 minutes, 24 seconds)
2025-05-06 19:28:03,478 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 19:28:03,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 19:28:42,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 312.72501 ± 130.977
2025-05-06 19:28:42,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [364.7058, 229.99109, 108.41106, 529.8164, 507.07568, 207.1744, 334.71765, 384.10706, 179.84859, 281.40237]
2025-05-06 19:28:42,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [150.0, 97.0, 84.0, 186.0, 199.0, 88.0, 133.0, 147.0, 84.0, 121.0]
2025-05-06 19:28:43,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 22/100 (estimated time remaining: 14 hours, 32 minutes, 25 seconds)
2025-05-06 19:39:08,095 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 19:39:08,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 19:40:02,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 459.34113 ± 322.864
2025-05-06 19:40:02,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [596.4712, 190.62105, 218.41908, 1117.2178, 117.1042, 938.4667, 288.42618, 577.57135, 309.06824, 240.04521]
2025-05-06 19:40:02,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [237.0, 86.0, 96.0, 464.0, 63.0, 329.0, 116.0, 187.0, 123.0, 101.0]
2025-05-06 19:40:02,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (459.34) for latency ExtremeSparseL4U32
2025-05-06 19:40:02,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-06 19:40:02,595 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 19:40:02,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 23/100 (estimated time remaining: 14 hours, 25 minutes, 9 seconds)
2025-05-06 19:50:21,199 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 19:50:21,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 19:51:02,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 336.85239 ± 176.085
2025-05-06 19:51:02,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [80.716835, 224.07278, 296.93182, 465.4121, 343.4855, 666.9144, 495.21814, 123.01692, 466.4238, 206.33154]
2025-05-06 19:51:02,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [59.0, 107.0, 126.0, 193.0, 143.0, 211.0, 168.0, 64.0, 194.0, 95.0]
2025-05-06 19:51:02,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 24/100 (estimated time remaining: 14 hours, 10 minutes, 37 seconds)
2025-05-06 20:01:27,479 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 20:01:27,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 20:02:23,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 417.67563 ± 203.613
2025-05-06 20:02:23,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [165.4616, 597.9572, 495.36206, 580.60455, 390.35953, 805.22424, 83.26483, 435.3417, 299.4121, 323.76874]
2025-05-06 20:02:23,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [105.0, 244.0, 226.0, 253.0, 169.0, 308.0, 64.0, 198.0, 125.0, 129.0]
2025-05-06 20:02:23,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 25/100 (estimated time remaining: 14 hours, 5 minutes, 44 seconds)
2025-05-06 20:12:39,906 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 20:12:39,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 20:13:42,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 487.69263 ± 347.051
2025-05-06 20:13:42,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [136.55836, 386.53705, 1341.7185, 518.3998, 206.4725, 275.31863, 897.9517, 364.60977, 435.59338, 313.7664]
2025-05-06 20:13:42,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [88.0, 153.0, 521.0, 207.0, 96.0, 123.0, 380.0, 154.0, 207.0, 136.0]
2025-05-06 20:13:42,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (487.69) for latency ExtremeSparseL4U32
2025-05-06 20:13:42,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-06 20:13:42,382 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 20:13:42,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 26/100 (estimated time remaining: 13 hours, 59 minutes, 25 seconds)
2025-05-06 20:24:10,579 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 20:24:10,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 20:24:49,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 290.86285 ± 237.204
2025-05-06 20:24:49,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [174.60382, 568.33716, 445.38892, 138.29884, 148.07043, 24.855024, 77.99572, 812.77606, 343.201, 175.10167]
2025-05-06 20:24:49,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [88.0, 228.0, 187.0, 73.0, 75.0, 28.0, 60.0, 306.0, 145.0, 87.0]
2025-05-06 20:24:49,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 27/100 (estimated time remaining: 13 hours, 50 minutes, 9 seconds)
2025-05-06 20:35:12,023 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 20:35:12,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 20:36:35,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 742.62097 ± 372.853
2025-05-06 20:36:35,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1378.8687, 1209.7897, 578.5223, 912.7332, 820.63824, 65.24624, 709.531, 395.84628, 436.1478, 918.88605]
2025-05-06 20:36:35,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [521.0, 411.0, 203.0, 342.0, 304.0, 60.0, 265.0, 155.0, 166.0, 350.0]
2025-05-06 20:36:35,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (742.62) for latency ExtremeSparseL4U32
2025-05-06 20:36:35,821 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-06 20:36:35,831 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 20:36:35,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 28/100 (estimated time remaining: 13 hours, 45 minutes, 41 seconds)
2025-05-06 20:46:56,985 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 20:46:56,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 20:47:54,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 469.68695 ± 294.426
2025-05-06 20:47:54,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [140.79234, 514.31714, 377.241, 172.7918, 360.4872, 597.8032, 640.19806, 265.75696, 411.08478, 1216.3973]
2025-05-06 20:47:54,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [86.0, 199.0, 152.0, 82.0, 140.0, 243.0, 230.0, 119.0, 163.0, 466.0]
2025-05-06 20:47:54,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 29/100 (estimated time remaining: 13 hours, 39 minutes, 4 seconds)
2025-05-06 20:58:06,662 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 20:58:06,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 20:58:44,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 267.58755 ± 227.162
2025-05-06 20:58:44,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [777.2817, 315.42953, 91.90537, 23.592594, 191.61008, 161.16116, 67.16985, 231.62154, 589.0659, 227.03793]
2025-05-06 20:58:44,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [320.0, 138.0, 53.0, 27.0, 107.0, 79.0, 42.0, 131.0, 231.0, 114.0]
2025-05-06 20:58:44,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 30/100 (estimated time remaining: 13 hours, 20 minutes, 12 seconds)
2025-05-06 21:09:06,176 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 21:09:06,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 21:10:03,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 476.70660 ± 217.960
2025-05-06 21:10:03,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [665.42346, 407.53897, 303.3413, 482.89383, 767.4882, 346.2536, 107.642975, 467.53558, 353.0984, 865.84955]
2025-05-06 21:10:03,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [225.0, 170.0, 124.0, 205.0, 269.0, 146.0, 57.0, 203.0, 146.0, 312.0]
2025-05-06 21:10:03,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 31/100 (estimated time remaining: 13 hours, 8 minutes, 52 seconds)
2025-05-06 21:20:22,421 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 21:20:22,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 21:21:12,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 413.65372 ± 164.940
2025-05-06 21:21:12,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [193.12907, 335.50943, 727.70337, 423.0269, 400.28723, 213.50293, 292.38504, 621.89496, 377.6315, 551.46655]
2025-05-06 21:21:12,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [89.0, 143.0, 247.0, 161.0, 157.0, 93.0, 122.0, 238.0, 153.0, 216.0]
2025-05-06 21:21:12,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 32/100 (estimated time remaining: 12 hours, 58 minutes, 11 seconds)
2025-05-06 21:31:36,907 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 21:31:36,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 21:32:21,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 347.26163 ± 170.781
2025-05-06 21:32:21,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [605.80334, 264.64984, 253.5553, 108.49459, 619.5077, 554.37585, 326.3105, 271.89752, 275.3433, 192.67828]
2025-05-06 21:32:21,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [262.0, 112.0, 111.0, 60.0, 264.0, 201.0, 136.0, 115.0, 121.0, 90.0]
2025-05-06 21:32:22,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 33/100 (estimated time remaining: 12 hours, 38 minutes, 29 seconds)
2025-05-06 21:42:56,048 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 21:42:56,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 21:43:50,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 445.91934 ± 201.906
2025-05-06 21:43:50,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [210.76927, 404.33865, 229.2121, 508.1383, 413.25116, 713.05383, 494.40076, 599.6611, 125.87269, 760.49536]
2025-05-06 21:43:50,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [101.0, 160.0, 123.0, 172.0, 163.0, 268.0, 192.0, 221.0, 65.0, 295.0]
2025-05-06 21:43:50,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 34/100 (estimated time remaining: 12 hours, 29 minutes, 21 seconds)
2025-05-06 21:53:53,856 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 21:53:54,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 21:55:14,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 705.49872 ± 591.572
2025-05-06 21:55:14,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [296.5223, 731.5807, 402.7325, 878.5781, 124.868195, 461.73334, 386.89743, 1704.8015, 168.19244, 1899.081]
2025-05-06 21:55:14,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [120.0, 287.0, 161.0, 325.0, 64.0, 199.0, 161.0, 583.0, 83.0, 697.0]
2025-05-06 21:55:14,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 35/100 (estimated time remaining: 12 hours, 25 minutes, 43 seconds)
2025-05-06 22:06:00,664 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 22:06:00,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 22:06:53,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 439.67276 ± 190.674
2025-05-06 22:06:53,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [527.69415, 630.61456, 535.7278, 709.6474, 320.74356, 270.45596, 306.33963, 107.277725, 320.50937, 667.71735]
2025-05-06 22:06:53,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [214.0, 236.0, 196.0, 270.0, 139.0, 122.0, 136.0, 57.0, 135.0, 253.0]
2025-05-06 22:06:54,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 36/100 (estimated time remaining: 12 hours, 19 minutes)
2025-05-06 22:16:48,870 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 22:16:48,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 22:17:43,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 463.13019 ± 280.425
2025-05-06 22:17:43,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [759.2087, 170.50668, 565.6633, 302.78018, 196.97607, 619.7462, 192.23073, 1026.2129, 195.49165, 602.4852]
2025-05-06 22:17:43,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [272.0, 83.0, 215.0, 140.0, 92.0, 248.0, 89.0, 350.0, 112.0, 200.0]
2025-05-06 22:17:43,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 37/100 (estimated time remaining: 12 hours, 3 minutes, 20 seconds)
2025-05-06 22:28:09,082 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 22:28:09,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 22:29:25,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 706.72180 ± 292.081
2025-05-06 22:29:26,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [641.5741, 760.2223, 95.62027, 802.4083, 835.60974, 932.9053, 1062.4614, 1025.2344, 311.54626, 599.6358]
2025-05-06 22:29:26,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [257.0, 264.0, 55.0, 287.0, 283.0, 334.0, 360.0, 337.0, 132.0, 228.0]
2025-05-06 22:29:26,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 38/100 (estimated time remaining: 11 hours, 59 minutes, 2 seconds)
2025-05-06 22:39:44,199 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 22:39:44,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 22:40:40,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 479.14542 ± 289.269
2025-05-06 22:40:40,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [769.6844, 35.264805, 387.41864, 641.72675, 299.6849, 1053.812, 149.8428, 561.5169, 588.7936, 303.70944]
2025-05-06 22:40:40,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [279.0, 30.0, 185.0, 231.0, 128.0, 370.0, 76.0, 234.0, 226.0, 130.0]
2025-05-06 22:40:40,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 39/100 (estimated time remaining: 11 hours, 44 minutes, 47 seconds)
2025-05-06 22:50:59,473 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 22:50:59,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 22:52:01,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 527.90088 ± 298.954
2025-05-06 22:52:01,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [587.852, 174.09807, 385.72183, 724.80786, 228.15666, 1091.9225, 436.4318, 877.74066, 638.19336, 134.08403]
2025-05-06 22:52:01,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [227.0, 94.0, 168.0, 266.0, 106.0, 403.0, 181.0, 284.0, 253.0, 70.0]
2025-05-06 22:52:01,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 40/100 (estimated time remaining: 11 hours, 32 minutes, 46 seconds)
2025-05-06 23:02:22,485 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 23:02:22,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 23:03:18,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 437.15845 ± 281.396
2025-05-06 23:03:18,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [425.82788, 507.61346, 163.22745, 734.1614, 298.11008, 255.11302, 174.61484, 282.6223, 1127.8167, 402.47757]
2025-05-06 23:03:18,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [193.0, 199.0, 81.0, 295.0, 122.0, 109.0, 101.0, 126.0, 442.0, 172.0]
2025-05-06 23:03:18,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 41/100 (estimated time remaining: 11 hours, 16 minutes, 54 seconds)
2025-05-06 23:13:42,818 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 23:13:42,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 23:14:41,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 447.82779 ± 416.658
2025-05-06 23:14:41,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [598.3136, 644.5149, 516.4029, 167.28734, 78.01069, 126.24501, 120.97714, 216.18842, 471.60767, 1538.7303]
2025-05-06 23:14:41,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [238.0, 268.0, 236.0, 81.0, 64.0, 79.0, 65.0, 101.0, 231.0, 572.0]
2025-05-06 23:14:41,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 42/100 (estimated time remaining: 11 hours, 12 minutes, 21 seconds)
2025-05-06 23:25:01,205 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 23:25:01,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 23:26:24,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 682.10406 ± 260.987
2025-05-06 23:26:25,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1007.93256, 724.9701, 795.22217, 261.66116, 571.0442, 521.32715, 825.6243, 240.18602, 932.6355, 940.43713]
2025-05-06 23:26:25,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [393.0, 289.0, 321.0, 122.0, 230.0, 225.0, 318.0, 110.0, 363.0, 380.0]
2025-05-06 23:26:25,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 43/100 (estimated time remaining: 11 hours, 1 minute, 2 seconds)
2025-05-06 23:37:01,252 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 23:37:01,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 23:37:54,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 427.53653 ± 349.565
2025-05-06 23:37:54,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [237.36926, 284.09903, 380.2502, 423.2759, 1362.1385, 133.0001, 677.657, 84.91159, 415.5769, 277.08643]
2025-05-06 23:37:54,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [110.0, 130.0, 156.0, 174.0, 511.0, 69.0, 289.0, 49.0, 176.0, 114.0]
2025-05-06 23:37:55,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 44/100 (estimated time remaining: 10 hours, 52 minutes, 36 seconds)
2025-05-06 23:48:02,089 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 23:48:02,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 23:49:20,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 657.72522 ± 521.306
2025-05-06 23:49:20,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1058.6512, 417.50522, 1544.2844, 1613.0897, 60.96292, 522.4602, 382.50125, 260.20917, 441.0542, 276.53387]
2025-05-06 23:49:20,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [368.0, 173.0, 584.0, 602.0, 37.0, 225.0, 182.0, 124.0, 183.0, 129.0]
2025-05-06 23:49:21,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 45/100 (estimated time remaining: 10 hours, 42 minutes)
2025-05-07 00:00:05,643 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 00:00:05,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 00:01:39,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 787.78845 ± 310.146
2025-05-07 00:01:42,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [767.40546, 729.1546, 442.60764, 643.51447, 458.76666, 808.89685, 399.477, 1184.9039, 1332.3785, 1110.78]
2025-05-07 00:01:42,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [291.0, 289.0, 183.0, 250.0, 192.0, 332.0, 171.0, 443.0, 506.0, 417.0]
2025-05-07 00:01:42,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (787.79) for latency ExtremeSparseL4U32
2025-05-07 00:01:42,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-07 00:01:42,141 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 00:01:43,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 46/100 (estimated time remaining: 10 hours, 42 minutes, 32 seconds)
2025-05-07 00:11:50,646 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 00:11:50,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 00:13:01,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 586.95032 ± 343.183
2025-05-07 00:13:02,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [410.18167, 883.59326, 1308.3588, 640.4413, 659.1138, 218.0744, 121.44164, 703.5455, 204.88422, 719.8687]
2025-05-07 00:13:02,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [169.0, 323.0, 482.0, 276.0, 285.0, 99.0, 86.0, 280.0, 97.0, 260.0]
2025-05-07 00:13:02,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 47/100 (estimated time remaining: 10 hours, 30 minutes, 5 seconds)
2025-05-07 00:23:22,978 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 00:23:22,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 00:25:12,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 922.38104 ± 754.963
2025-05-07 00:25:14,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1508.2638, 495.04004, 2273.7327, 286.67038, 110.591774, 1127.8167, 592.3495, 2118.4111, 528.8607, 182.07423]
2025-05-07 00:25:14,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [593.0, 198.0, 831.0, 129.0, 61.0, 431.0, 259.0, 799.0, 216.0, 87.0]
2025-05-07 00:25:14,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (922.38) for latency ExtremeSparseL4U32
2025-05-07 00:25:14,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-07 00:25:14,896 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 00:25:19,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 48/100 (estimated time remaining: 10 hours, 24 minutes, 18 seconds)
2025-05-07 00:35:47,774 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 00:35:48,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 00:37:07,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 643.49640 ± 651.303
2025-05-07 00:37:07,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [341.31424, 1968.4108, 360.35794, 88.74816, 191.60287, 363.2025, 22.089125, 1251.9161, 1560.5496, 286.7726]
2025-05-07 00:37:07,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [141.0, 736.0, 147.0, 50.0, 109.0, 146.0, 26.0, 471.0, 595.0, 135.0]
2025-05-07 00:37:07,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 49/100 (estimated time remaining: 10 hours, 15 minutes, 39 seconds)
2025-05-07 00:47:15,770 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 00:47:15,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 00:48:48,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 800.61145 ± 454.957
2025-05-07 00:48:48,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [123.19212, 1537.2587, 1235.5201, 842.0617, 1175.0358, 658.56854, 616.8703, 1217.396, 365.6555, 234.55579]
2025-05-07 00:48:48,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [79.0, 591.0, 435.0, 286.0, 448.0, 248.0, 205.0, 465.0, 159.0, 131.0]
2025-05-07 00:48:48,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 50/100 (estimated time remaining: 10 hours, 6 minutes, 27 seconds)
2025-05-07 00:59:14,197 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 00:59:14,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 01:00:55,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 856.00995 ± 426.855
2025-05-07 01:00:55,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1464.5742, 425.5211, 1581.6802, 541.3327, 563.23956, 1295.7452, 586.90845, 632.6093, 1058.055, 410.43335]
2025-05-07 01:00:55,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [554.0, 173.0, 618.0, 221.0, 229.0, 513.0, 249.0, 228.0, 407.0, 172.0]
2025-05-07 01:00:55,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 51/100 (estimated time remaining: 9 hours, 51 minutes, 58 seconds)
2025-05-07 01:11:36,774 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 01:11:36,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 01:12:56,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 660.96063 ± 786.307
2025-05-07 01:12:56,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [624.37866, 2706.963, 1290.2411, 112.619576, 111.38394, 102.724785, 220.73491, 371.3143, 101.39427, 967.8521]
2025-05-07 01:12:56,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [261.0, 1000.0, 489.0, 59.0, 58.0, 79.0, 104.0, 146.0, 56.0, 361.0]
2025-05-07 01:12:56,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 52/100 (estimated time remaining: 9 hours, 46 minutes, 57 seconds)
2025-05-07 01:23:16,293 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 01:23:16,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 01:24:50,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 816.52209 ± 700.386
2025-05-07 01:24:50,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [519.74713, 2411.2266, 1188.6726, 116.848755, 503.60303, 548.992, 944.99536, 1599.5422, 175.84534, 155.74768]
2025-05-07 01:24:50,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [203.0, 879.0, 449.0, 64.0, 197.0, 205.0, 350.0, 572.0, 83.0, 75.0]
2025-05-07 01:24:50,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 53/100 (estimated time remaining: 9 hours, 31 minutes, 25 seconds)
2025-05-07 01:35:01,843 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 01:35:01,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 01:36:21,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 663.16101 ± 572.996
2025-05-07 01:36:21,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [437.51562, 426.75497, 398.28378, 620.3824, 1993.7823, 206.91049, 298.6829, 372.8039, 322.92502, 1553.5691]
2025-05-07 01:36:21,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [180.0, 176.0, 159.0, 238.0, 747.0, 95.0, 150.0, 158.0, 164.0, 573.0]
2025-05-07 01:36:21,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 54/100 (estimated time remaining: 9 hours, 16 minutes, 52 seconds)
2025-05-07 01:46:34,333 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 01:46:34,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 01:47:48,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 616.55548 ± 400.705
2025-05-07 01:47:48,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [24.951124, 670.0847, 1088.3688, 506.61313, 407.86508, 390.7008, 1055.8162, 442.8543, 223.09825, 1355.2024]
2025-05-07 01:47:48,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [27.0, 270.0, 405.0, 210.0, 162.0, 157.0, 369.0, 172.0, 119.0, 523.0]
2025-05-07 01:47:48,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 55/100 (estimated time remaining: 9 hours, 2 minutes, 51 seconds)
2025-05-07 01:58:12,616 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 01:58:12,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 01:59:51,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 867.95520 ± 785.399
2025-05-07 01:59:51,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1435.1344, 167.00687, 128.84903, 2747.878, 98.25123, 589.93823, 1370.2267, 995.3787, 859.26044, 287.6283]
2025-05-07 01:59:51,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [491.0, 79.0, 85.0, 1000.0, 81.0, 254.0, 474.0, 362.0, 311.0, 117.0]
2025-05-07 01:59:51,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 56/100 (estimated time remaining: 8 hours, 50 minutes, 22 seconds)
2025-05-07 02:10:24,952 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 02:10:25,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 02:11:34,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 586.20911 ± 320.028
2025-05-07 02:11:34,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [788.96515, 698.5422, 352.80136, 234.39227, 891.9345, 153.54358, 1061.5482, 639.7103, 890.48584, 150.16737]
2025-05-07 02:11:34,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [307.0, 261.0, 141.0, 121.0, 317.0, 74.0, 396.0, 258.0, 323.0, 72.0]
2025-05-07 02:11:34,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 57/100 (estimated time remaining: 8 hours, 35 minutes, 58 seconds)
2025-05-07 02:22:12,470 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 02:22:12,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 02:24:17,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1084.68140 ± 837.260
2025-05-07 02:24:17,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [197.64369, 443.98013, 1121.6416, 988.9914, 1639.5779, 1572.8159, 2814.8074, 1772.5115, 19.661451, 275.18225]
2025-05-07 02:24:17,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [112.0, 189.0, 421.0, 404.0, 592.0, 597.0, 1000.0, 654.0, 23.0, 127.0]
2025-05-07 02:24:17,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (1084.68) for latency ExtremeSparseL4U32
2025-05-07 02:24:17,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-07 02:24:17,423 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 02:24:17,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 58/100 (estimated time remaining: 8 hours, 31 minutes, 15 seconds)
2025-05-07 02:34:26,406 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 02:34:26,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 02:35:43,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 650.88293 ± 431.353
2025-05-07 02:35:43,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1518.5477, 485.1729, 1170.1599, 389.7647, 370.26886, 193.60571, 673.11945, 1011.31525, 82.62027, 614.25507]
2025-05-07 02:35:43,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [548.0, 215.0, 441.0, 159.0, 150.0, 89.0, 266.0, 351.0, 65.0, 238.0]
2025-05-07 02:35:43,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 59/100 (estimated time remaining: 8 hours, 18 minutes, 40 seconds)
2025-05-07 02:46:05,846 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 02:46:05,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 02:47:10,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 550.39020 ± 287.287
2025-05-07 02:47:10,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [802.0867, 363.9118, 1167.1365, 461.3903, 121.866356, 654.3963, 771.2891, 453.42627, 422.3317, 286.06693]
2025-05-07 02:47:10,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [310.0, 146.0, 402.0, 184.0, 64.0, 235.0, 296.0, 183.0, 183.0, 131.0]
2025-05-07 02:47:10,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 60/100 (estimated time remaining: 8 hours, 6 minutes, 41 seconds)
2025-05-07 02:57:18,373 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 02:57:18,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 02:59:53,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1410.07544 ± 843.105
2025-05-07 02:59:53,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [2738.4211, 1086.9097, 418.56433, 2890.1443, 436.56665, 426.5741, 1538.8351, 1404.5958, 1775.2922, 1384.8501]
2025-05-07 02:59:53,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [1000.0, 405.0, 171.0, 1000.0, 181.0, 170.0, 559.0, 508.0, 638.0, 492.0]
2025-05-07 02:59:53,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (1410.08) for latency ExtremeSparseL4U32
2025-05-07 02:59:53,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-07 02:59:53,876 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 02:59:53,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 61/100 (estimated time remaining: 8 hours, 22 seconds)
2025-05-07 03:10:16,580 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 03:10:16,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 03:11:41,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 723.29773 ± 496.885
2025-05-07 03:11:42,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [389.9919, 851.6345, 454.88385, 312.69284, 145.63217, 385.75546, 652.30035, 1906.5138, 1144.5588, 989.01324]
2025-05-07 03:11:42,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [156.0, 334.0, 200.0, 127.0, 71.0, 169.0, 268.0, 674.0, 412.0, 364.0]
2025-05-07 03:11:42,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 62/100 (estimated time remaining: 7 hours, 49 minutes, 3 seconds)
2025-05-07 03:21:55,432 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 03:21:55,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 03:24:01,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1161.62256 ± 870.514
2025-05-07 03:24:01,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [899.89233, 2935.7478, 426.58694, 1774.8668, 624.27875, 678.6714, 407.47308, 234.1962, 1253.776, 2380.7366]
2025-05-07 03:24:01,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [331.0, 1000.0, 172.0, 621.0, 249.0, 263.0, 167.0, 106.0, 457.0, 838.0]
2025-05-07 03:24:01,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 63/100 (estimated time remaining: 7 hours, 34 minutes, 2 seconds)
2025-05-07 03:35:07,801 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 03:35:07,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 03:36:47,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 884.09943 ± 696.387
2025-05-07 03:36:47,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [876.13464, 244.26738, 2410.0981, 1484.3234, 694.64056, 81.10945, 425.97104, 435.54218, 1628.7266, 560.1813]
2025-05-07 03:36:47,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [320.0, 109.0, 860.0, 539.0, 259.0, 60.0, 187.0, 186.0, 594.0, 207.0]
2025-05-07 03:36:47,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 64/100 (estimated time remaining: 7 hours, 31 minutes, 55 seconds)
2025-05-07 03:46:36,434 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 03:46:36,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 03:47:52,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 625.81702 ± 629.873
2025-05-07 03:47:52,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [101.13928, 77.37614, 402.73767, 99.93881, 1210.9814, 104.58723, 1294.9077, 741.0909, 242.12508, 1983.2863]
2025-05-07 03:47:52,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [75.0, 62.0, 181.0, 71.0, 456.0, 56.0, 473.0, 287.0, 123.0, 692.0]
2025-05-07 03:47:52,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 65/100 (estimated time remaining: 7 hours, 17 minutes, 5 seconds)
2025-05-07 03:58:02,674 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 03:58:02,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 03:59:16,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 630.66272 ± 577.917
2025-05-07 03:59:16,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [916.35486, 19.464067, 84.39514, 394.75262, 1996.9148, 1198.3394, 209.20468, 378.03094, 343.98087, 765.18964]
2025-05-07 03:59:16,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [330.0, 25.0, 67.0, 156.0, 703.0, 411.0, 114.0, 151.0, 142.0, 309.0]
2025-05-07 03:59:16,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 66/100 (estimated time remaining: 6 hours, 55 minutes, 36 seconds)
2025-05-07 04:10:30,951 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 04:10:30,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 04:12:02,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 783.65289 ± 624.332
2025-05-07 04:12:02,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [967.3959, 2281.1255, 1113.9592, 676.73773, 637.9007, 101.85665, 761.04846, 91.87266, 1087.215, 117.416565]
2025-05-07 04:12:02,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [362.0, 826.0, 403.0, 251.0, 248.0, 73.0, 303.0, 52.0, 405.0, 62.0]
2025-05-07 04:12:02,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 67/100 (estimated time remaining: 6 hours, 50 minutes, 15 seconds)
2025-05-07 04:21:27,824 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 04:21:27,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 04:22:58,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 795.08270 ± 845.178
2025-05-07 04:22:58,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [744.8584, 184.18785, 406.59872, 654.0793, 1939.2815, 147.65372, 422.13437, 381.31677, 2851.0476, 219.66893]
2025-05-07 04:22:58,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [293.0, 86.0, 150.0, 244.0, 686.0, 71.0, 167.0, 148.0, 1000.0, 96.0]
2025-05-07 04:22:58,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 68/100 (estimated time remaining: 6 hours, 28 minutes, 59 seconds)
2025-05-07 04:33:59,375 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 04:33:59,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 04:35:10,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 614.55133 ± 571.624
2025-05-07 04:35:10,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [310.06265, 220.60262, 422.0171, 1499.4915, 1319.4026, 241.62627, 439.3119, 1571.9537, 26.357975, 94.68693]
2025-05-07 04:35:10,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [129.0, 99.0, 150.0, 525.0, 496.0, 103.0, 183.0, 565.0, 28.0, 66.0]
2025-05-07 04:35:10,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 69/100 (estimated time remaining: 6 hours, 13 minutes, 36 seconds)
2025-05-07 04:45:03,440 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 04:45:03,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 04:46:57,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1035.54541 ± 926.653
2025-05-07 04:46:58,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [513.4962, 56.137997, 1697.3252, 216.73846, 225.83418, 72.469, 1190.8623, 2125.2197, 2857.4233, 1399.9484]
2025-05-07 04:46:58,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [195.0, 51.0, 607.0, 94.0, 120.0, 68.0, 428.0, 761.0, 987.0, 491.0]
2025-05-07 04:46:58,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 6 minutes, 24 seconds)
2025-05-07 04:57:40,461 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 04:57:41,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 04:59:36,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1009.61163 ± 669.577
2025-05-07 04:59:36,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [375.15228, 52.823563, 212.7839, 677.317, 1473.1974, 733.5314, 1461.533, 2200.282, 1219.5447, 1689.9513]
2025-05-07 04:59:36,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [178.0, 50.0, 96.0, 264.0, 551.0, 290.0, 515.0, 786.0, 458.0, 606.0]
2025-05-07 04:59:36,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 2 minutes, 3 seconds)
2025-05-07 05:10:15,270 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 05:10:15,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 05:11:55,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 876.74493 ± 577.919
2025-05-07 05:11:55,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [685.30774, 473.44568, 602.8992, 1475.3203, 649.9731, 229.5404, 455.64407, 1397.0159, 610.73737, 2187.5652]
2025-05-07 05:11:55,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [261.0, 189.0, 225.0, 542.0, 242.0, 116.0, 182.0, 535.0, 209.0, 782.0]
2025-05-07 05:11:55,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 72/100 (estimated time remaining: 5 hours, 47 minutes, 20 seconds)
2025-05-07 05:21:52,723 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 05:21:52,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 05:24:02,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1176.84412 ± 946.140
2025-05-07 05:24:02,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [200.02106, 2852.6707, 1246.1492, 277.0177, 961.17, 1762.8866, 2807.7864, 433.5559, 876.1, 351.0836]
2025-05-07 05:24:02,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [91.0, 1000.0, 447.0, 136.0, 354.0, 616.0, 1000.0, 173.0, 325.0, 167.0]
2025-05-07 05:24:02,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 73/100 (estimated time remaining: 5 hours, 41 minutes, 59 seconds)
2025-05-07 05:34:51,708 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 05:34:52,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 05:37:57,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1727.78577 ± 959.505
2025-05-07 05:37:57,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1783.466, 720.7204, 2594.9126, 1786.8951, 1711.2145, 85.546265, 482.84796, 2239.1677, 2897.7427, 2975.3438]
2025-05-07 05:37:57,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [622.0, 267.0, 873.0, 621.0, 583.0, 67.0, 194.0, 803.0, 1000.0, 1000.0]
2025-05-07 05:37:57,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (1727.79) for latency ExtremeSparseL4U32
2025-05-07 05:37:57,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-07 05:37:57,398 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-hopper/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 05:37:57,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 39 minutes)
2025-05-07 05:47:55,268 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 05:47:55,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 05:49:40,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 944.36542 ± 824.649
2025-05-07 05:49:40,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [839.10034, 324.05524, 1489.0062, 467.68515, 296.65726, 208.30766, 1433.973, 187.81566, 1249.0629, 2947.9902]
2025-05-07 05:49:40,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [307.0, 138.0, 515.0, 183.0, 126.0, 111.0, 508.0, 89.0, 441.0, 1000.0]
2025-05-07 05:49:40,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 26 minutes, 3 seconds)
2025-05-07 06:00:38,291 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 06:00:38,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 06:02:20,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 941.34778 ± 497.817
2025-05-07 06:02:20,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [878.13055, 695.0449, 2039.3833, 430.74408, 580.3323, 527.09796, 1522.1091, 612.6864, 1355.3939, 772.5559]
2025-05-07 06:02:20,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [302.0, 260.0, 708.0, 185.0, 248.0, 204.0, 543.0, 213.0, 508.0, 266.0]
2025-05-07 06:02:20,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 13 minutes, 38 seconds)
2025-05-07 06:12:01,005 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 06:12:01,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 06:13:46,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 902.58777 ± 670.195
2025-05-07 06:13:47,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [229.37694, 387.09924, 1431.1091, 2574.0347, 601.62225, 845.3259, 748.28656, 918.48346, 1119.2196, 171.32037]
2025-05-07 06:13:47,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [127.0, 172.0, 502.0, 932.0, 228.0, 316.0, 284.0, 342.0, 412.0, 83.0]
2025-05-07 06:13:47,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 77/100 (estimated time remaining: 4 hours, 56 minutes, 56 seconds)
2025-05-07 06:24:01,235 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 06:24:02,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 06:25:56,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1041.47815 ± 824.603
2025-05-07 06:25:57,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [205.22441, 710.39813, 1520.2915, 1820.3243, 621.1588, 29.728504, 1027.774, 1228.7354, 2897.779, 353.3667]
2025-05-07 06:25:57,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [93.0, 259.0, 528.0, 636.0, 239.0, 34.0, 363.0, 461.0, 1000.0, 146.0]
2025-05-07 06:25:57,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 78/100 (estimated time remaining: 4 hours, 44 minutes, 46 seconds)
2025-05-07 06:36:44,239 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 06:36:44,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 06:38:52,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1164.35815 ± 939.939
2025-05-07 06:38:52,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1041.091, 749.853, 28.06626, 1551.1145, 525.1295, 2814.894, 1214.0226, 314.05286, 525.99786, 2879.3608]
2025-05-07 06:38:52,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [374.0, 271.0, 32.0, 563.0, 218.0, 1000.0, 393.0, 134.0, 213.0, 1000.0]
2025-05-07 06:38:52,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 28 minutes, 2 seconds)
2025-05-07 06:48:55,529 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 06:48:55,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 06:50:10,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 643.01825 ± 461.116
2025-05-07 06:50:10,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [672.31604, 236.09453, 330.88736, 220.58876, 1544.6989, 1262.427, 1057.9081, 127.632614, 520.07275, 457.5566]
2025-05-07 06:50:10,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [267.0, 117.0, 146.0, 99.0, 533.0, 433.0, 381.0, 79.0, 201.0, 198.0]
2025-05-07 06:50:10,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 14 minutes, 5 seconds)
2025-05-07 07:00:35,338 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 07:00:35,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 07:01:34,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 493.82611 ± 334.989
2025-05-07 07:01:34,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [454.01578, 379.60803, 997.6626, 203.10039, 237.0969, 451.8202, 549.21155, 153.18773, 1227.7842, 284.77405]
2025-05-07 07:01:34,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [173.0, 149.0, 392.0, 90.0, 106.0, 174.0, 186.0, 74.0, 422.0, 162.0]
2025-05-07 07:01:34,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 81/100 (estimated time remaining: 3 hours, 56 minutes, 56 seconds)
2025-05-07 07:12:23,445 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 07:12:23,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 07:13:37,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 677.32837 ± 827.063
2025-05-07 07:13:37,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [234.86894, 103.76682, 210.21298, 198.6641, 429.3962, 1708.4465, 134.78683, 710.8649, 2750.79, 291.48642]
2025-05-07 07:13:37,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [106.0, 57.0, 115.0, 91.0, 169.0, 585.0, 78.0, 246.0, 889.0, 146.0]
2025-05-07 07:13:37,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 82/100 (estimated time remaining: 3 hours, 47 minutes, 24 seconds)
2025-05-07 07:23:35,150 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 07:23:36,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 07:25:32,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1067.48364 ± 784.956
2025-05-07 07:25:32,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [113.607475, 1010.285, 414.74387, 2900.7625, 910.8449, 1088.9258, 740.9945, 2052.9084, 477.46454, 964.2986]
2025-05-07 07:25:32,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [73.0, 367.0, 165.0, 1000.0, 335.0, 394.0, 278.0, 712.0, 183.0, 329.0]
2025-05-07 07:25:32,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 34 minutes, 32 seconds)
2025-05-07 07:36:08,088 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 07:36:08,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 07:38:28,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1245.48999 ± 939.083
2025-05-07 07:38:29,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1016.87573, 217.20699, 2792.3757, 219.68169, 2818.524, 796.8621, 1033.1929, 1542.085, 189.45944, 1828.6367]
2025-05-07 07:38:29,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [366.0, 103.0, 950.0, 113.0, 1000.0, 290.0, 388.0, 566.0, 89.0, 660.0]
2025-05-07 07:38:29,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 22 minutes, 42 seconds)
2025-05-07 07:48:50,591 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 07:48:50,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 07:50:18,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 753.53137 ± 953.343
2025-05-07 07:50:18,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [416.5019, 2392.8228, 177.72421, 2861.7622, 389.38556, 401.53122, 519.48535, 185.5887, 20.815378, 169.69695]
2025-05-07 07:50:18,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [188.0, 817.0, 101.0, 1000.0, 157.0, 161.0, 215.0, 93.0, 26.0, 97.0]
2025-05-07 07:50:18,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 12 minutes, 26 seconds)
2025-05-07 08:00:42,072 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 08:00:42,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 08:02:22,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 915.56018 ± 903.277
2025-05-07 08:02:22,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [314.60422, 2064.0906, 307.8333, 81.22884, 448.2028, 3063.1243, 445.91867, 1265.2378, 486.55466, 678.8069]
2025-05-07 08:02:22,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [134.0, 687.0, 133.0, 63.0, 188.0, 1000.0, 179.0, 427.0, 217.0, 266.0]
2025-05-07 08:02:22,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 2 minutes, 23 seconds)
2025-05-07 08:12:28,180 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 08:12:28,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 08:14:43,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1279.03943 ± 975.789
2025-05-07 08:14:43,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [840.36346, 2262.4778, 283.67783, 1849.6244, 274.49738, 684.1499, 2421.8562, 962.9575, 179.94867, 3030.8413]
2025-05-07 08:14:43,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [298.0, 779.0, 121.0, 635.0, 117.0, 244.0, 817.0, 336.0, 98.0, 1000.0]
2025-05-07 08:14:43,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 87/100 (estimated time remaining: 2 hours, 51 minutes, 5 seconds)
2025-05-07 08:25:07,032 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 08:25:07,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 08:27:03,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1076.96375 ± 822.091
2025-05-07 08:27:03,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [2998.318, 1272.7688, 1660.4495, 1069.3943, 149.66794, 1159.5151, 381.7584, 694.1453, 1360.6296, 22.990692]
2025-05-07 08:27:03,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [1000.0, 443.0, 591.0, 379.0, 79.0, 456.0, 154.0, 276.0, 478.0, 27.0]
2025-05-07 08:27:03,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 39 minutes, 55 seconds)
2025-05-07 08:38:14,394 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 08:38:14,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 08:40:30,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1333.02966 ± 936.595
2025-05-07 08:40:30,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1822.7942, 131.53874, 1780.2527, 278.1159, 1512.8566, 2652.995, 2616.6228, 595.95197, 90.35822, 1848.8107]
2025-05-07 08:40:30,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [605.0, 66.0, 602.0, 143.0, 537.0, 857.0, 868.0, 233.0, 72.0, 580.0]
2025-05-07 08:40:30,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 28 minutes, 49 seconds)
2025-05-07 08:50:13,351 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 08:50:13,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 08:52:40,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1405.76868 ± 1194.764
2025-05-07 08:52:41,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [236.83197, 992.3413, 294.6554, 121.72606, 2775.0583, 1116.5719, 181.15848, 2082.3938, 3163.2869, 3093.6619]
2025-05-07 08:52:41,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [104.0, 359.0, 143.0, 85.0, 866.0, 384.0, 112.0, 710.0, 1000.0, 1000.0]
2025-05-07 08:52:42,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 17 minutes, 15 seconds)
2025-05-07 09:02:47,641 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 09:02:47,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 09:03:43,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 478.78760 ± 410.475
2025-05-07 09:03:43,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1609.0083, 325.28802, 255.62001, 750.48364, 241.84822, 121.221214, 275.41858, 386.5654, 318.1459, 504.2767]
2025-05-07 09:03:43,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [543.0, 145.0, 106.0, 264.0, 122.0, 65.0, 122.0, 150.0, 131.0, 186.0]
2025-05-07 09:03:43,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 2 minutes, 41 seconds)
2025-05-07 09:14:02,040 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 09:14:02,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 09:16:02,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1110.22144 ± 1050.748
2025-05-07 09:16:02,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [137.53574, 164.76387, 2998.084, 1474.4995, 2926.545, 113.4969, 481.92892, 1625.9595, 656.3097, 523.0911]
2025-05-07 09:16:02,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [92.0, 109.0, 1000.0, 504.0, 1000.0, 79.0, 211.0, 573.0, 249.0, 202.0]
2025-05-07 09:16:02,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 50 minutes, 22 seconds)
2025-05-07 09:26:34,503 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 09:26:34,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 09:27:39,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 576.32843 ± 578.087
2025-05-07 09:27:39,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [152.30254, 105.66919, 1994.8976, 723.755, 397.3878, 128.09972, 88.76748, 450.18533, 502.15543, 1220.0642]
2025-05-07 09:27:39,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [75.0, 81.0, 672.0, 269.0, 159.0, 80.0, 51.0, 190.0, 174.0, 382.0]
2025-05-07 09:27:39,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 36 minutes, 57 seconds)
2025-05-07 09:37:59,444 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 09:37:59,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 09:39:28,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 819.05920 ± 1099.292
2025-05-07 09:39:28,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [729.2344, 144.3044, 143.22357, 122.86574, 3000.9783, 2975.5923, 87.724396, 385.2879, 238.20181, 363.18]
2025-05-07 09:39:28,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [265.0, 71.0, 70.0, 63.0, 1000.0, 1000.0, 66.0, 166.0, 114.0, 178.0]
2025-05-07 09:39:28,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 22 minutes, 34 seconds)
2025-05-07 09:49:48,951 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 09:49:48,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 09:51:54,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1221.93835 ± 1026.465
2025-05-07 09:51:54,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [275.74597, 485.39764, 103.566055, 3079.56, 1686.6967, 3059.7703, 986.2425, 1294.0945, 505.93567, 742.3754]
2025-05-07 09:51:54,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [139.0, 187.0, 56.0, 1000.0, 554.0, 1000.0, 310.0, 443.0, 215.0, 234.0]
2025-05-07 09:51:54,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 11 minutes, 2 seconds)
2025-05-07 10:03:00,824 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 10:03:00,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 10:05:01,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1116.91479 ± 912.962
2025-05-07 10:05:01,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [491.4087, 297.81445, 1566.3224, 1512.6869, 2965.5334, 2331.5217, 1008.54877, 137.41855, 656.757, 201.13617]
2025-05-07 10:05:01,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [183.0, 131.0, 564.0, 518.0, 1000.0, 799.0, 362.0, 83.0, 257.0, 92.0]
2025-05-07 10:05:01,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 1 minute, 17 seconds)
2025-05-07 10:15:08,514 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 10:15:08,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 10:17:24,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1271.70459 ± 1143.049
2025-05-07 10:17:24,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [642.6223, 2736.4922, 499.27515, 28.298191, 2874.2358, 292.69833, 689.552, 153.15178, 1801.5261, 2999.193]
2025-05-07 10:17:24,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [251.0, 953.0, 195.0, 30.0, 966.0, 129.0, 280.0, 75.0, 648.0, 1000.0]
2025-05-07 10:17:24,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 97/100 (estimated time remaining: 49 minutes, 5 seconds)
2025-05-07 10:27:38,669 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 10:27:38,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 10:30:05,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1390.23145 ± 1078.539
2025-05-07 10:30:06,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [3063.9868, 81.2347, 969.2656, 461.98206, 1366.9475, 265.79745, 2968.359, 414.06198, 1839.4258, 2471.2544]
2025-05-07 10:30:06,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [1000.0, 46.0, 323.0, 178.0, 504.0, 132.0, 1000.0, 162.0, 624.0, 845.0]
2025-05-07 10:30:06,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 98/100 (estimated time remaining: 37 minutes, 28 seconds)
2025-05-07 10:40:43,330 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 10:40:43,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 10:43:20,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1515.00269 ± 970.643
2025-05-07 10:43:20,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [3000.6218, 1864.532, 488.3272, 1189.346, 3036.751, 2304.24, 376.94827, 814.7009, 1650.2607, 424.29843]
2025-05-07 10:43:20,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [1000.0, 618.0, 183.0, 451.0, 1000.0, 761.0, 152.0, 286.0, 574.0, 165.0]
2025-05-07 10:43:20,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 99/100 (estimated time remaining: 25 minutes, 32 seconds)
2025-05-07 10:54:10,231 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 10:54:10,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 10:56:33,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1402.64575 ± 1070.068
2025-05-07 10:56:33,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [708.0017, 3018.0989, 146.0309, 1037.849, 432.66446, 1166.1012, 2808.5535, 3114.9126, 858.37445, 735.8714]
2025-05-07 10:56:33,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [228.0, 1000.0, 71.0, 362.0, 170.0, 428.0, 916.0, 1000.0, 308.0, 287.0]
2025-05-07 10:56:33,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 100/100 (estimated time remaining: 12 minutes, 55 seconds)
2025-05-07 11:07:01,288 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 11:07:01,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:09:05,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1118.95654 ± 847.406
2025-05-07 11:09:05,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1322.2714, 270.0419, 1306.2814, 2908.625, 417.8673, 1550.974, 290.47037, 979.34827, 2018.5464, 125.140015]
2025-05-07 11:09:05,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [461.0, 133.0, 446.0, 1000.0, 171.0, 545.0, 136.0, 364.0, 687.0, 90.0]
2025-05-07 11:09:05,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1149 [DEBUG]: Training session finished
