2025-05-01 22:21:15,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-mbpac_memdelay
2025-05-01 22:21:15,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-mbpac_memdelay
2025-05-01 22:21:15,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1008 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x7f66f95b3b50>}
2025-05-01 22:21:15,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1009 [DEBUG]: using device: cuda
2025-05-01 22:21:15,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1031 [INFO]: Creating new trainer
2025-05-01 22:21:16,021 baseline-mbpac-noisy-hopper:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-05-01 22:21:16,021 baseline-mbpac-noisy-hopper:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-01 22:21:16,041 baseline-mbpac-noisy-hopper:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=3, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-05-01 22:21:16,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1092 [DEBUG]: Starting training session...
2025-05-01 22:21:16,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 1/100
2025-05-01 22:37:18,938 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 22:37:18,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 22:38:12,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 193.66563 ± 28.683
2025-05-01 22:38:12,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [205.83058, 133.02692, 201.71289, 222.38252, 187.2244, 177.19865, 209.4288, 206.33212, 159.52907, 233.99051]
2025-05-01 22:38:12,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [95.0, 69.0, 95.0, 104.0, 90.0, 84.0, 103.0, 97.0, 78.0, 116.0]
2025-05-01 22:38:12,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (193.67) for latency ExtremeClogL1U23
2025-05-01 22:38:12,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-01 22:38:12,673 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-01 22:38:12,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 2/100 (estimated time remaining: 27 hours, 56 minutes)
2025-05-01 22:55:57,686 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 22:55:57,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 22:56:44,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 231.24214 ± 93.133
2025-05-01 22:56:44,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [242.11751, 24.583628, 269.38846, 347.41022, 270.40454, 246.36595, 292.1508, 279.1727, 253.45708, 87.370316]
2025-05-01 22:56:44,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [108.0, 26.0, 119.0, 139.0, 122.0, 117.0, 127.0, 127.0, 113.0, 50.0]
2025-05-01 22:56:44,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (231.24) for latency ExtremeClogL1U23
2025-05-01 22:56:44,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-01 22:56:44,808 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-01 22:56:44,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 3/100 (estimated time remaining: 28 hours, 57 minutes, 46 seconds)
2025-05-01 23:13:38,492 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 23:13:38,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 23:14:29,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 229.32715 ± 38.944
2025-05-01 23:14:29,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [277.8966, 215.894, 243.14474, 171.656, 233.06583, 179.48636, 262.127, 223.69154, 293.75964, 192.54993]
2025-05-01 23:14:29,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [148.0, 100.0, 128.0, 91.0, 119.0, 108.0, 111.0, 102.0, 143.0, 96.0]
2025-05-01 23:14:29,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 4/100 (estimated time remaining: 28 hours, 40 minutes, 28 seconds)
2025-05-01 23:31:17,712 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 23:31:17,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 23:32:38,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 322.75775 ± 100.854
2025-05-01 23:32:38,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [315.5001, 405.6238, 392.9981, 367.30594, 421.13135, 339.5776, 133.84247, 393.814, 130.00427, 327.77972]
2025-05-01 23:32:38,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [153.0, 261.0, 213.0, 183.0, 223.0, 155.0, 75.0, 191.0, 78.0, 225.0]
2025-05-01 23:32:38,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (322.76) for latency ExtremeClogL1U23
2025-05-01 23:32:38,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-01 23:32:38,524 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-01 23:32:38,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 5/100 (estimated time remaining: 28 hours, 32 minutes, 38 seconds)
2025-05-01 23:49:31,720 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 23:49:31,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 23:50:44,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 334.02927 ± 111.094
2025-05-01 23:50:44,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [140.6344, 361.72095, 384.0911, 398.12183, 340.4389, 417.4425, 379.80112, 422.3192, 94.733185, 400.98978]
2025-05-01 23:50:44,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [90.0, 170.0, 177.0, 187.0, 164.0, 185.0, 176.0, 200.0, 53.0, 179.0]
2025-05-01 23:50:44,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (334.03) for latency ExtremeClogL1U23
2025-05-01 23:50:44,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-01 23:50:44,547 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-01 23:50:44,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 6/100 (estimated time remaining: 28 hours, 19 minutes, 45 seconds)
2025-05-02 00:07:13,873 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 00:07:13,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 00:08:49,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 376.41693 ± 128.923
2025-05-02 00:08:49,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [440.7762, 223.34276, 488.3135, 329.38138, 378.31927, 424.48093, 396.2419, 281.64883, 165.22086, 636.4433]
2025-05-02 00:08:49,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [249.0, 172.0, 270.0, 177.0, 175.0, 254.0, 199.0, 179.0, 108.0, 361.0]
2025-05-02 00:08:49,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (376.42) for latency ExtremeClogL1U23
2025-05-02 00:08:49,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-02 00:08:49,748 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 00:08:49,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 7/100 (estimated time remaining: 28 hours, 23 minutes, 37 seconds)
2025-05-02 00:25:55,486 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 00:25:55,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 00:27:04,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 320.88190 ± 94.873
2025-05-02 00:27:04,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [295.0219, 430.52988, 372.51825, 345.04062, 388.05267, 232.02519, 82.95991, 383.56436, 354.15936, 324.9466]
2025-05-02 00:27:04,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [120.0, 223.0, 175.0, 158.0, 178.0, 131.0, 56.0, 203.0, 160.0, 143.0]
2025-05-02 00:27:04,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 8/100 (estimated time remaining: 28 hours, 13 seconds)
2025-05-02 00:44:00,907 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 00:44:00,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 00:45:10,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 353.59235 ± 57.148
2025-05-02 00:45:10,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [435.0307, 320.85583, 342.4365, 353.12012, 294.49902, 319.75888, 372.57535, 460.06482, 263.6777, 373.9044]
2025-05-02 00:45:10,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [175.0, 134.0, 153.0, 165.0, 124.0, 154.0, 173.0, 193.0, 118.0, 157.0]
2025-05-02 00:45:10,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 9/100 (estimated time remaining: 27 hours, 48 minutes, 39 seconds)
2025-05-02 01:01:43,862 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 01:01:43,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 01:02:45,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 355.78717 ± 129.615
2025-05-02 01:02:45,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [371.02975, 303.48163, 26.318943, 488.99243, 498.7206, 351.8611, 375.02145, 481.3989, 351.3724, 309.67468]
2025-05-02 01:02:45,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [144.0, 123.0, 28.0, 189.0, 201.0, 138.0, 143.0, 170.0, 140.0, 135.0]
2025-05-02 01:02:45,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 10/100 (estimated time remaining: 27 hours, 20 minutes, 7 seconds)
2025-05-02 01:19:41,318 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 01:19:41,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 01:21:07,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 488.16309 ± 124.864
2025-05-02 01:21:07,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [486.06335, 403.28336, 698.45593, 474.2673, 717.33417, 452.9053, 290.1668, 491.8687, 485.42108, 381.86508]
2025-05-02 01:21:07,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [189.0, 157.0, 285.0, 198.0, 249.0, 181.0, 135.0, 212.0, 186.0, 154.0]
2025-05-02 01:21:07,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (488.16) for latency ExtremeClogL1U23
2025-05-02 01:21:07,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-02 01:21:07,240 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 01:21:07,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 11/100 (estimated time remaining: 27 hours, 6 minutes, 48 seconds)
2025-05-02 01:38:21,821 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 01:38:21,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 01:40:06,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 653.09546 ± 347.673
2025-05-02 01:40:06,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1005.2737, 353.1506, 1245.3033, 401.32867, 771.3627, 67.01823, 741.8862, 940.42566, 711.7173, 293.48807]
2025-05-02 01:40:06,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [359.0, 152.0, 417.0, 180.0, 272.0, 44.0, 243.0, 305.0, 249.0, 125.0]
2025-05-02 01:40:06,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (653.10) for latency ExtremeClogL1U23
2025-05-02 01:40:06,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-02 01:40:06,318 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 01:40:06,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 12/100 (estimated time remaining: 27 hours, 4 minutes, 42 seconds)
2025-05-02 01:56:57,649 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 01:56:57,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 01:58:51,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 668.40100 ± 291.016
2025-05-02 01:58:51,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [946.38477, 690.79346, 731.17737, 177.43971, 1007.41815, 615.5664, 1004.87115, 156.48515, 552.4365, 801.43726]
2025-05-02 01:58:51,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [322.0, 237.0, 258.0, 89.0, 366.0, 254.0, 366.0, 81.0, 196.0, 291.0]
2025-05-02 01:58:51,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (668.40) for latency ExtremeClogL1U23
2025-05-02 01:58:51,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-02 01:58:51,360 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 01:58:51,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 13/100 (estimated time remaining: 26 hours, 55 minutes, 13 seconds)
2025-05-02 02:16:33,867 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 02:16:33,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 02:18:27,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 594.96350 ± 272.434
2025-05-02 02:18:27,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [788.0872, 848.4952, 422.2978, 830.5376, 375.15985, 964.1123, 751.1287, 611.5773, 189.5723, 168.6661]
2025-05-02 02:18:27,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [343.0, 260.0, 179.0, 383.0, 181.0, 421.0, 327.0, 184.0, 92.0, 99.0]
2025-05-02 02:18:27,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 14/100 (estimated time remaining: 27 hours, 2 minutes, 59 seconds)
2025-05-02 02:35:19,238 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 02:35:19,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 02:37:19,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 716.82715 ± 321.050
2025-05-02 02:37:19,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1107.2251, 562.3006, 196.27953, 429.7718, 1060.3036, 362.37933, 563.1871, 775.0851, 1072.606, 1039.1332]
2025-05-02 02:37:19,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [368.0, 221.0, 110.0, 173.0, 349.0, 175.0, 212.0, 311.0, 357.0, 319.0]
2025-05-02 02:37:19,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (716.83) for latency ExtremeClogL1U23
2025-05-02 02:37:19,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-02 02:37:19,411 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 02:37:19,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 15/100 (estimated time remaining: 27 hours, 6 minutes, 31 seconds)
2025-05-02 02:52:43,687 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 02:52:43,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 02:54:48,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 860.32654 ± 453.242
2025-05-02 02:54:48,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [808.2362, 384.0945, 1508.1321, 1072.1508, 678.47266, 1504.6227, 215.66681, 382.77985, 1352.0353, 697.0748]
2025-05-02 02:54:48,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [284.0, 163.0, 477.0, 379.0, 245.0, 471.0, 102.0, 164.0, 403.0, 268.0]
2025-05-02 02:54:48,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (860.33) for latency ExtremeClogL1U23
2025-05-02 02:54:48,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-02 02:54:48,930 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 02:54:48,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 16/100 (estimated time remaining: 26 hours, 32 minutes, 48 seconds)
2025-05-02 03:10:12,953 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 03:10:12,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 03:13:26,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1148.43970 ± 848.367
2025-05-02 03:13:26,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [44.966278, 871.9572, 1156.7953, 1700.0851, 1785.2544, 2726.776, 434.70642, 125.1614, 2056.6353, 582.06067]
2025-05-02 03:13:26,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [40.0, 378.0, 459.0, 667.0, 712.0, 928.0, 188.0, 75.0, 741.0, 228.0]
2025-05-02 03:13:26,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (1148.44) for latency ExtremeClogL1U23
2025-05-02 03:13:26,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-02 03:13:26,450 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 03:13:26,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 17/100 (estimated time remaining: 26 hours, 8 minutes, 2 seconds)
2025-05-02 03:29:03,860 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 03:29:03,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 03:30:55,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 762.43341 ± 603.885
2025-05-02 03:30:55,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [191.98613, 424.68237, 2172.7522, 355.2236, 1243.329, 191.85728, 1106.5393, 293.228, 1096.1346, 548.6016]
2025-05-02 03:30:55,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [93.0, 172.0, 694.0, 149.0, 393.0, 91.0, 360.0, 123.0, 355.0, 198.0]
2025-05-02 03:30:55,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 18/100 (estimated time remaining: 25 hours, 28 minutes, 23 seconds)
2025-05-02 03:46:07,043 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 03:46:07,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 03:47:52,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 751.82733 ± 308.425
2025-05-02 03:47:52,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [62.62876, 1033.2377, 1050.0109, 357.2781, 837.4333, 573.8712, 872.3703, 936.0918, 1003.6618, 791.69]
2025-05-02 03:47:52,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [44.0, 322.0, 327.0, 143.0, 277.0, 238.0, 275.0, 306.0, 318.0, 239.0]
2025-05-02 03:47:52,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 19/100 (estimated time remaining: 24 hours, 26 minutes, 33 seconds)
2025-05-02 04:01:43,674 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 04:01:43,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 04:03:44,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 908.36035 ± 491.214
2025-05-02 04:03:44,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [71.925316, 965.1978, 1581.5978, 1547.5576, 1181.2819, 719.77747, 1009.303, 856.6489, 1076.7922, 73.521935]
2025-05-02 04:03:44,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [42.0, 292.0, 477.0, 474.0, 362.0, 250.0, 312.0, 265.0, 318.0, 45.0]
2025-05-02 04:03:44,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 20/100 (estimated time remaining: 23 hours, 19 minutes, 55 seconds)
2025-05-02 04:18:59,709 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 04:18:59,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 04:20:11,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 507.75055 ± 300.631
2025-05-02 04:20:11,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [146.04425, 509.94394, 185.40356, 901.44916, 837.6953, 264.33215, 73.225746, 602.1744, 751.5301, 805.7065]
2025-05-02 04:20:11,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [90.0, 196.0, 110.0, 285.0, 261.0, 115.0, 55.0, 214.0, 235.0, 244.0]
2025-05-02 04:20:11,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 21/100 (estimated time remaining: 22 hours, 45 minutes, 59 seconds)
2025-05-02 04:34:18,741 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 04:34:18,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 04:35:59,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 717.39008 ± 401.284
2025-05-02 04:35:59,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [803.9976, 925.0668, 79.561134, 878.5335, 1474.1941, 1069.774, 390.3165, 219.58528, 853.4633, 479.40826]
2025-05-02 04:35:59,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [252.0, 277.0, 49.0, 275.0, 464.0, 331.0, 152.0, 105.0, 262.0, 182.0]
2025-05-02 04:35:59,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 22/100 (estimated time remaining: 21 hours, 44 minutes, 16 seconds)
2025-05-02 04:50:50,828 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 04:50:50,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 04:52:40,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 732.23669 ± 383.241
2025-05-02 04:52:40,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1286.9493, 1221.2562, 264.13382, 786.1405, 472.52765, 772.83417, 757.4709, 51.433537, 593.4803, 1116.14]
2025-05-02 04:52:40,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [395.0, 405.0, 133.0, 284.0, 178.0, 265.0, 270.0, 43.0, 222.0, 376.0]
2025-05-02 04:52:40,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 23/100 (estimated time remaining: 21 hours, 15 minutes, 14 seconds)
2025-05-02 05:06:16,470 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 05:06:16,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 05:08:08,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 815.95691 ± 426.774
2025-05-02 05:08:08,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [103.473854, 76.22551, 1052.4211, 1015.4939, 961.71277, 868.6986, 1590.3014, 662.4082, 960.2102, 868.6239]
2025-05-02 05:08:08,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [62.0, 56.0, 323.0, 300.0, 291.0, 299.0, 495.0, 230.0, 308.0, 268.0]
2025-05-02 05:08:08,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 24/100 (estimated time remaining: 20 hours, 36 minutes, 7 seconds)
2025-05-02 05:21:57,472 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 05:21:57,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 05:25:40,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1566.00391 ± 774.548
2025-05-02 05:25:40,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [843.41187, 912.7049, 1932.8344, 1280.7664, 1293.4634, 1324.1753, 930.1937, 2947.487, 1142.543, 3052.4587]
2025-05-02 05:25:40,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [291.0, 297.0, 646.0, 431.0, 459.0, 391.0, 309.0, 1000.0, 334.0, 977.0]
2025-05-02 05:25:40,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (1566.00) for latency ExtremeClogL1U23
2025-05-02 05:25:40,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-02 05:25:40,662 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 05:25:40,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 25/100 (estimated time remaining: 20 hours, 45 minutes, 28 seconds)
2025-05-02 05:39:46,477 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 05:39:46,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 05:41:48,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 813.61993 ± 792.702
2025-05-02 05:41:48,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [340.90753, 87.48177, 260.44647, 777.6257, 42.626705, 805.1771, 701.45654, 1493.3181, 2852.0461, 775.11316]
2025-05-02 05:41:48,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [144.0, 61.0, 115.0, 279.0, 41.0, 271.0, 265.0, 471.0, 1000.0, 251.0]
2025-05-02 05:41:48,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 26/100 (estimated time remaining: 20 hours, 24 minutes, 17 seconds)
2025-05-02 05:55:27,242 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 05:55:27,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 05:57:21,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 780.91565 ± 501.450
2025-05-02 05:57:21,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1811.253, 938.7131, 436.6147, 896.0753, 986.37537, 1051.3555, 64.47856, 64.53135, 1057.7101, 502.0497]
2025-05-02 05:57:21,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [599.0, 287.0, 167.0, 278.0, 296.0, 353.0, 49.0, 53.0, 336.0, 193.0]
2025-05-02 05:57:21,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 27/100 (estimated time remaining: 20 hours, 4 minutes, 19 seconds)
2025-05-02 06:11:05,824 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 06:11:05,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 06:12:40,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 560.86584 ± 379.299
2025-05-02 06:12:40,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1082.7275, 81.63369, 63.59727, 940.511, 228.25905, 1168.5238, 345.90894, 653.2379, 460.26996, 583.9887]
2025-05-02 06:12:40,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [319.0, 63.0, 47.0, 289.0, 102.0, 361.0, 139.0, 266.0, 179.0, 214.0]
2025-05-02 06:12:40,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 28/100 (estimated time remaining: 19 hours, 27 minutes, 53 seconds)
2025-05-02 06:26:35,577 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 06:26:35,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 06:29:23,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1126.85083 ± 753.639
2025-05-02 06:29:23,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1345.1196, 159.41602, 856.51306, 719.5422, 1261.4656, 1163.0721, 2985.1501, 1525.5599, 1029.1079, 223.56326]
2025-05-02 06:29:23,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [397.0, 79.0, 292.0, 260.0, 413.0, 362.0, 1000.0, 459.0, 318.0, 102.0]
2025-05-02 06:29:23,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 29/100 (estimated time remaining: 19 hours, 29 minutes, 54 seconds)
2025-05-02 06:42:38,609 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 06:42:38,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 06:45:17,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1033.23279 ± 798.107
2025-05-02 06:45:17,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1077.4879, 153.9866, 2744.5334, 245.92616, 171.24663, 1092.3339, 2026.7891, 519.6604, 1046.4208, 1253.9432]
2025-05-02 06:45:17,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [351.0, 79.0, 858.0, 109.0, 106.0, 375.0, 696.0, 196.0, 309.0, 384.0]
2025-05-02 06:45:17,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 30/100 (estimated time remaining: 18 hours, 50 minutes, 26 seconds)
2025-05-02 06:58:47,195 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 06:58:47,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 07:01:23,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1134.59949 ± 471.709
2025-05-02 07:01:23,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [2109.4788, 1074.6779, 1331.7074, 922.8251, 1119.6915, 1552.3363, 144.66902, 1032.7715, 1025.5188, 1032.3181]
2025-05-02 07:01:23,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [661.0, 356.0, 437.0, 301.0, 346.0, 452.0, 72.0, 314.0, 364.0, 346.0]
2025-05-02 07:01:23,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 31/100 (estimated time remaining: 18 hours, 34 minutes, 6 seconds)
2025-05-02 07:15:20,014 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 07:15:20,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 07:17:43,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1065.39490 ± 533.938
2025-05-02 07:17:43,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1473.8666, 1335.0892, 1117.2917, 680.2702, 625.85803, 1069.0278, 1244.8436, 2250.1287, 326.90802, 530.66614]
2025-05-02 07:17:43,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [461.0, 407.0, 351.0, 243.0, 223.0, 320.0, 373.0, 666.0, 135.0, 210.0]
2025-05-02 07:17:43,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 32/100 (estimated time remaining: 18 hours, 28 minutes, 57 seconds)
2025-05-02 07:31:11,482 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 07:31:11,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 07:33:38,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1098.88257 ± 643.965
2025-05-02 07:33:38,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1119.227, 967.9962, 2282.1794, 531.5714, 1889.6442, 172.20952, 215.33633, 1421.3145, 1093.8419, 1295.5057]
2025-05-02 07:33:38,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [330.0, 319.0, 739.0, 198.0, 570.0, 88.0, 99.0, 419.0, 323.0, 391.0]
2025-05-02 07:33:38,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 33/100 (estimated time remaining: 18 hours, 21 minutes, 15 seconds)
2025-05-02 07:47:36,219 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 07:47:36,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 07:50:52,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1345.51868 ± 926.524
2025-05-02 07:50:52,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [162.84073, 2008.4985, 274.61655, 1995.8097, 1124.7289, 3097.5625, 175.48007, 2072.5625, 998.0857, 1545.001]
2025-05-02 07:50:52,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [80.0, 643.0, 137.0, 675.0, 392.0, 1000.0, 85.0, 713.0, 318.0, 528.0]
2025-05-02 07:50:52,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 34/100 (estimated time remaining: 18 hours, 11 minutes, 50 seconds)
2025-05-02 08:04:40,815 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 08:04:40,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 08:06:37,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 777.66663 ± 926.181
2025-05-02 08:06:37,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1042.2438, 136.44623, 28.306335, 86.670975, 47.984455, 3065.0728, 1672.5876, 717.5657, 891.86127, 87.9266]
2025-05-02 08:06:37,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [325.0, 70.0, 30.0, 60.0, 34.0, 1000.0, 568.0, 257.0, 294.0, 52.0]
2025-05-02 08:06:37,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 35/100 (estimated time remaining: 17 hours, 53 minutes, 43 seconds)
2025-05-02 08:20:36,605 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 08:20:36,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 08:23:35,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1254.35840 ± 889.925
2025-05-02 08:23:35,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [393.357, 1100.7202, 1530.7313, 827.15155, 139.5735, 1060.092, 2667.4324, 905.7899, 3078.0735, 840.6626]
2025-05-02 08:23:35,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [157.0, 375.0, 474.0, 288.0, 85.0, 364.0, 875.0, 277.0, 1000.0, 299.0]
2025-05-02 08:23:35,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 36/100 (estimated time remaining: 17 hours, 48 minutes, 34 seconds)
2025-05-02 08:37:27,480 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 08:37:27,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 08:39:13,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 721.73401 ± 509.106
2025-05-02 08:39:13,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [742.5801, 1670.7635, 1209.3124, 131.4771, 242.42552, 1047.7228, 546.71075, 68.58085, 388.44446, 1169.3224]
2025-05-02 08:39:13,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [264.0, 521.0, 366.0, 70.0, 122.0, 330.0, 200.0, 56.0, 174.0, 367.0]
2025-05-02 08:39:13,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 37/100 (estimated time remaining: 17 hours, 23 minutes, 7 seconds)
2025-05-02 08:53:03,522 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 08:53:03,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 08:55:03,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 857.73651 ± 397.972
2025-05-02 08:55:03,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [118.330124, 1141.7274, 1488.4396, 573.5804, 841.9525, 1163.3453, 306.55972, 909.43854, 1133.3336, 900.65814]
2025-05-02 08:55:03,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [63.0, 363.0, 464.0, 202.0, 295.0, 376.0, 131.0, 293.0, 364.0, 285.0]
2025-05-02 08:55:03,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 38/100 (estimated time remaining: 17 hours, 5 minutes, 50 seconds)
2025-05-02 09:09:11,578 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 09:09:11,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 09:11:07,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 759.81097 ± 680.526
2025-05-02 09:11:07,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [651.14984, 120.491806, 600.0266, 597.5408, 2303.3523, 108.92762, 223.56775, 1235.9542, 242.10349, 1514.9955]
2025-05-02 09:11:07,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [252.0, 75.0, 220.0, 240.0, 685.0, 63.0, 101.0, 444.0, 105.0, 508.0]
2025-05-02 09:11:07,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 39/100 (estimated time remaining: 16 hours, 35 minutes, 5 seconds)
2025-05-02 09:24:23,782 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 09:24:23,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 09:26:30,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 862.88391 ± 566.510
2025-05-02 09:26:30,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [411.793, 923.3122, 1103.5848, 270.48044, 804.5223, 1190.51, 1889.7858, 364.5404, 67.638145, 1602.6719]
2025-05-02 09:26:30,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [181.0, 302.0, 380.0, 116.0, 304.0, 375.0, 577.0, 148.0, 51.0, 485.0]
2025-05-02 09:26:30,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 40/100 (estimated time remaining: 16 hours, 14 minutes, 34 seconds)
2025-05-02 09:40:07,543 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 09:40:07,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 09:42:04,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 729.65009 ± 824.229
2025-05-02 09:42:04,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [206.96788, 324.4565, 260.68143, 1471.3467, 260.0899, 813.78687, 2869.2073, 181.56749, 48.80171, 859.59546]
2025-05-02 09:42:04,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [110.0, 153.0, 112.0, 438.0, 110.0, 284.0, 874.0, 91.0, 40.0, 263.0]
2025-05-02 09:42:04,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 41/100 (estimated time remaining: 15 hours, 41 minutes, 51 seconds)
2025-05-02 09:55:52,966 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 09:55:52,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 09:58:16,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1012.38556 ± 632.148
2025-05-02 09:58:16,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1607.2515, 266.81638, 1056.5682, 831.8687, 2012.7017, 1239.5271, 134.49521, 1523.1962, 91.82845, 1359.603]
2025-05-02 09:58:16,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [481.0, 114.0, 356.0, 287.0, 620.0, 364.0, 71.0, 475.0, 57.0, 435.0]
2025-05-02 09:58:16,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 42/100 (estimated time remaining: 15 hours, 32 minutes, 54 seconds)
2025-05-02 10:12:15,514 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 10:12:15,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 10:16:27,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1774.91565 ± 990.461
2025-05-02 10:16:27,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1537.9769, 1866.4082, 477.28757, 2859.6223, 1386.3094, 946.5983, 3253.7976, 226.77538, 2360.5854, 2833.7942]
2025-05-02 10:16:27,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [486.0, 601.0, 201.0, 864.0, 468.0, 323.0, 1000.0, 103.0, 711.0, 913.0]
2025-05-02 10:16:27,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (1774.92) for latency ExtremeClogL1U23
2025-05-02 10:16:27,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-02 10:16:27,535 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 10:16:27,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 43/100 (estimated time remaining: 15 hours, 44 minutes, 14 seconds)
2025-05-02 10:30:35,361 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 10:30:35,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 10:33:12,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1093.77026 ± 544.510
2025-05-02 10:33:12,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [869.59174, 1030.67, 1043.3391, 602.25055, 368.0377, 1793.6912, 1044.2577, 1353.1906, 588.0515, 2244.6223]
2025-05-02 10:33:12,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [280.0, 351.0, 355.0, 207.0, 155.0, 582.0, 326.0, 408.0, 216.0, 717.0]
2025-05-02 10:33:12,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 44/100 (estimated time remaining: 15 hours, 35 minutes, 50 seconds)
2025-05-02 10:46:35,204 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 10:46:35,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 10:49:07,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1056.83362 ± 827.930
2025-05-02 10:49:07,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [797.5723, 72.3245, 1609.2498, 79.67905, 2556.5798, 1901.3463, 163.17249, 533.8035, 1074.4037, 1780.2041]
2025-05-02 10:49:07,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [275.0, 58.0, 551.0, 56.0, 784.0, 613.0, 92.0, 195.0, 370.0, 581.0]
2025-05-02 10:49:07,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 45/100 (estimated time remaining: 15 hours, 25 minutes, 17 seconds)
2025-05-02 11:03:06,684 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 11:03:06,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 11:04:59,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 752.50183 ± 573.925
2025-05-02 11:04:59,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [308.8176, 150.84949, 1311.1069, 314.6527, 345.14023, 420.15738, 413.70248, 1007.244, 1978.0144, 1275.3328]
2025-05-02 11:04:59,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [128.0, 93.0, 383.0, 131.0, 140.0, 162.0, 166.0, 334.0, 639.0, 430.0]
2025-05-02 11:04:59,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 46/100 (estimated time remaining: 15 hours, 12 minutes, 9 seconds)
2025-05-02 11:18:51,364 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 11:18:51,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 11:20:42,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 804.61023 ± 418.533
2025-05-02 11:20:42,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [805.1123, 236.52095, 232.13478, 1104.1212, 1097.2437, 1116.856, 103.61645, 972.43945, 1107.4998, 1270.5579]
2025-05-02 11:20:42,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [277.0, 117.0, 120.0, 342.0, 336.0, 357.0, 58.0, 290.0, 338.0, 373.0]
2025-05-02 11:20:42,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 47/100 (estimated time remaining: 14 hours, 50 minutes, 11 seconds)
2025-05-02 11:34:50,163 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 11:34:50,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 11:36:43,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 809.20331 ± 554.635
2025-05-02 11:36:43,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [407.5164, 530.1394, 73.10459, 2033.9838, 154.0755, 1117.6356, 1073.5496, 1192.6896, 600.0258, 909.31287]
2025-05-02 11:36:43,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [157.0, 193.0, 58.0, 610.0, 87.0, 327.0, 314.0, 355.0, 214.0, 294.0]
2025-05-02 11:36:43,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 48/100 (estimated time remaining: 14 hours, 10 minutes, 45 seconds)
2025-05-02 11:50:20,619 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 11:50:20,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 11:52:00,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 574.83868 ± 545.843
2025-05-02 11:52:00,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1158.31, 132.17229, 69.27227, 537.79364, 371.67615, 282.58105, 366.83984, 127.989044, 1904.0416, 797.7109]
2025-05-02 11:52:00,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [345.0, 67.0, 52.0, 201.0, 160.0, 140.0, 160.0, 74.0, 625.0, 273.0]
2025-05-02 11:52:00,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 49/100 (estimated time remaining: 13 hours, 39 minutes, 22 seconds)
2025-05-02 12:06:24,744 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 12:06:24,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 12:08:08,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 700.30853 ± 525.858
2025-05-02 12:08:08,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1314.8815, 1119.8755, 65.0333, 1216.0685, 284.61038, 1440.3373, 595.259, 76.18493, 826.8582, 63.976917]
2025-05-02 12:08:08,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [401.0, 331.0, 39.0, 412.0, 134.0, 430.0, 219.0, 56.0, 300.0, 48.0]
2025-05-02 12:08:08,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 50/100 (estimated time remaining: 13 hours, 25 minutes, 52 seconds)
2025-05-02 12:21:24,160 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 12:21:24,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 12:22:52,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 604.86151 ± 474.166
2025-05-02 12:22:52,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [462.50507, 538.8521, 46.709656, 91.84278, 1549.7119, 750.452, 1030.448, 127.214905, 340.27557, 1110.6033]
2025-05-02 12:22:52,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [176.0, 199.0, 33.0, 59.0, 474.0, 266.0, 371.0, 76.0, 140.0, 339.0]
2025-05-02 12:22:52,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 51/100 (estimated time remaining: 12 hours, 58 minutes, 41 seconds)
2025-05-02 12:36:50,458 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 12:36:50,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 12:39:23,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1112.82568 ± 661.746
2025-05-02 12:39:23,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [846.6049, 2347.4126, 79.141426, 856.9627, 169.18222, 845.9782, 1628.0695, 1460.8126, 1249.0831, 1645.0095]
2025-05-02 12:39:23,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [286.0, 688.0, 47.0, 293.0, 96.0, 289.0, 488.0, 433.0, 372.0, 526.0]
2025-05-02 12:39:23,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 52/100 (estimated time remaining: 12 hours, 51 minutes, 5 seconds)
2025-05-02 12:53:40,841 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 12:53:40,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 12:56:55,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1461.18872 ± 804.792
2025-05-02 12:56:55,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1323.3239, 736.25073, 1932.3352, 3042.5356, 1365.568, 1462.7732, 160.09749, 2333.1238, 615.41754, 1640.461]
2025-05-02 12:56:55,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [414.0, 256.0, 602.0, 953.0, 449.0, 479.0, 84.0, 770.0, 227.0, 507.0]
2025-05-02 12:56:55,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 53/100 (estimated time remaining: 12 hours, 49 minutes, 58 seconds)
2025-05-02 13:11:13,621 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 13:11:13,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 13:13:53,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1123.91418 ± 380.418
2025-05-02 13:13:53,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1148.7388, 683.8328, 1086.8668, 1321.7635, 1166.5557, 1685.3649, 1624.4545, 1296.0471, 827.93304, 397.5851]
2025-05-02 13:13:53,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [378.0, 243.0, 343.0, 399.0, 387.0, 502.0, 478.0, 408.0, 277.0, 156.0]
2025-05-02 13:13:53,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 54/100 (estimated time remaining: 12 hours, 49 minutes, 47 seconds)
2025-05-02 13:27:23,517 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 13:27:23,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 13:30:11,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1123.23267 ± 614.536
2025-05-02 13:30:11,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1141.7886, 1474.8625, 249.00113, 414.258, 1923.0259, 236.97206, 1757.5961, 1617.4766, 1595.4613, 821.8856]
2025-05-02 13:30:11,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [335.0, 436.0, 108.0, 159.0, 625.0, 106.0, 529.0, 486.0, 475.0, 283.0]
2025-05-02 13:30:11,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 55/100 (estimated time remaining: 12 hours, 34 minutes, 51 seconds)
2025-05-02 13:43:36,560 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 13:43:36,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 13:45:07,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 628.17285 ± 527.614
2025-05-02 13:45:07,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [98.0823, 1844.8285, 126.87434, 134.93349, 1118.3326, 643.6294, 759.061, 135.3901, 793.90094, 626.69574]
2025-05-02 13:45:07,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [70.0, 548.0, 67.0, 84.0, 355.0, 229.0, 261.0, 70.0, 286.0, 226.0]
2025-05-02 13:45:07,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 56/100 (estimated time remaining: 12 hours, 20 minutes, 16 seconds)
2025-05-02 13:59:40,184 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 13:59:40,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 14:03:03,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1438.10815 ± 1137.122
2025-05-02 14:03:03,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [123.94565, 3196.423, 996.41345, 497.36624, 3125.0618, 1087.1023, 148.64154, 1289.1227, 2916.6602, 1000.3451]
2025-05-02 14:03:03,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [64.0, 1000.0, 338.0, 195.0, 1000.0, 364.0, 94.0, 448.0, 961.0, 357.0]
2025-05-02 14:03:03,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 57/100 (estimated time remaining: 12 hours, 16 minutes, 20 seconds)
2025-05-02 14:16:52,880 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 14:16:52,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 14:19:47,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1297.00708 ± 635.121
2025-05-02 14:19:47,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1631.2137, 1412.0626, 169.24184, 1018.0027, 1310.3412, 783.7201, 1096.4647, 1100.7417, 1725.3298, 2722.9517]
2025-05-02 14:19:47,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [527.0, 437.0, 79.0, 331.0, 421.0, 264.0, 340.0, 367.0, 513.0, 795.0]
2025-05-02 14:19:47,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 58/100 (estimated time remaining: 11 hours, 52 minutes, 39 seconds)
2025-05-02 14:34:07,673 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 14:34:07,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 14:37:16,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1347.61877 ± 942.601
2025-05-02 14:37:16,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1423.1101, 1167.361, 505.19266, 585.0414, 1302.5646, 2819.588, 246.65105, 3124.1367, 1837.4194, 465.12317]
2025-05-02 14:37:16,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [456.0, 348.0, 186.0, 214.0, 381.0, 887.0, 113.0, 1000.0, 595.0, 192.0]
2025-05-02 14:37:16,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 59/100 (estimated time remaining: 11 hours, 40 minutes, 27 seconds)
2025-05-02 14:52:28,268 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 14:52:28,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 14:57:15,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1993.16797 ± 936.683
2025-05-02 14:57:15,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [3241.7397, 1654.3447, 3186.846, 1434.6425, 1326.968, 2820.1846, 477.9188, 2141.871, 2787.8997, 859.2647]
2025-05-02 14:57:15,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [1000.0, 495.0, 1000.0, 428.0, 434.0, 868.0, 189.0, 635.0, 818.0, 295.0]
2025-05-02 14:57:15,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (1993.17) for latency ExtremeClogL1U23
2025-05-02 14:57:15,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-02 14:57:15,939 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-02 14:57:15,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 60/100 (estimated time remaining: 11 hours, 54 minutes, 3 seconds)
2025-05-02 15:13:24,552 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 15:13:24,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 15:15:44,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 970.84637 ± 758.076
2025-05-02 15:15:44,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [177.35092, 244.67128, 1195.2754, 756.29236, 291.79565, 2172.8994, 1278.4904, 1323.5353, 2209.9624, 58.190674]
2025-05-02 15:15:44,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [83.0, 107.0, 404.0, 281.0, 134.0, 704.0, 413.0, 412.0, 651.0, 48.0]
2025-05-02 15:15:44,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 61/100 (estimated time remaining: 12 hours, 4 minutes, 58 seconds)
2025-05-02 15:33:39,448 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 15:33:39,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 15:36:37,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1043.06323 ± 809.288
2025-05-02 15:36:37,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [3073.377, 1396.2528, 342.12503, 1371.1808, 664.901, 570.9859, 29.554646, 1092.2798, 1325.0231, 564.9517]
2025-05-02 15:36:37,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [916.0, 419.0, 139.0, 423.0, 236.0, 206.0, 31.0, 338.0, 430.0, 219.0]
2025-05-02 15:36:37,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 62/100 (estimated time remaining: 12 hours, 9 minutes, 47 seconds)
2025-05-02 15:54:21,731 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 15:54:21,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 15:57:32,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1175.94678 ± 528.298
2025-05-02 15:57:32,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [197.43582, 573.34143, 2054.8674, 1131.6179, 1407.5063, 1205.5164, 1528.8279, 1794.6064, 1022.25885, 843.4897]
2025-05-02 15:57:32,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [92.0, 222.0, 635.0, 378.0, 449.0, 355.0, 473.0, 541.0, 310.0, 248.0]
2025-05-02 15:57:32,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 63/100 (estimated time remaining: 12 hours, 22 minutes, 50 seconds)
2025-05-02 16:15:09,865 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 16:15:09,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 16:18:51,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1408.17407 ± 988.665
2025-05-02 16:18:51,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [2097.3264, 1499.2169, 3253.6443, 177.50034, 1393.0704, 2830.1611, 384.29382, 417.11377, 1080.2831, 949.1308]
2025-05-02 16:18:51,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [616.0, 438.0, 1000.0, 82.0, 423.0, 847.0, 150.0, 158.0, 361.0, 315.0]
2025-05-02 16:18:51,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 64/100 (estimated time remaining: 12 hours, 31 minutes, 37 seconds)
2025-05-02 16:37:16,998 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 16:37:17,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 16:42:48,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1909.58374 ± 743.560
2025-05-02 16:42:48,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [2419.725, 2055.424, 1694.4015, 3327.7046, 1518.9681, 452.10275, 1279.1426, 1971.9108, 2610.4111, 1766.0465]
2025-05-02 16:42:48,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [757.0, 624.0, 505.0, 1000.0, 461.0, 192.0, 412.0, 600.0, 786.0, 522.0]
2025-05-02 16:42:48,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 65/100 (estimated time remaining: 12 hours, 39 minutes, 56 seconds)
2025-05-02 17:00:59,141 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 17:00:59,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 17:03:38,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1138.48669 ± 488.599
2025-05-02 17:03:38,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1157.0305, 1408.6306, 1747.97, 235.55228, 1058.3271, 1761.5364, 1591.6035, 850.9984, 472.7659, 1100.4519]
2025-05-02 17:03:38,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [349.0, 423.0, 528.0, 104.0, 352.0, 542.0, 493.0, 299.0, 179.0, 382.0]
2025-05-02 17:03:38,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 66/100 (estimated time remaining: 12 hours, 35 minutes, 14 seconds)
2025-05-02 17:20:56,806 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 17:20:56,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 17:22:55,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 775.80151 ± 807.988
2025-05-02 17:22:55,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [794.5083, 76.28753, 172.37328, 1159.2073, 1915.517, 136.3695, 568.7917, 272.86728, 125.34414, 2536.749]
2025-05-02 17:22:55,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [270.0, 61.0, 83.0, 404.0, 621.0, 67.0, 205.0, 129.0, 67.0, 744.0]
2025-05-02 17:22:55,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 67/100 (estimated time remaining: 12 hours, 2 minutes, 50 seconds)
2025-05-02 17:39:41,460 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 17:39:41,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 17:42:30,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1269.86011 ± 1019.399
2025-05-02 17:42:30,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [3028.0027, 643.14374, 164.18777, 2244.9924, 442.73547, 68.312584, 1589.8715, 1406.989, 477.54034, 2632.8252]
2025-05-02 17:42:30,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [890.0, 233.0, 78.0, 714.0, 172.0, 48.0, 519.0, 428.0, 177.0, 786.0]
2025-05-02 17:42:30,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 68/100 (estimated time remaining: 11 hours, 32 minutes, 47 seconds)
2025-05-02 17:59:55,694 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 17:59:55,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 18:01:38,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 684.76093 ± 566.833
2025-05-02 18:01:38,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [379.60413, 1402.8226, 86.60232, 334.5206, 48.971436, 1271.0157, 527.6783, 1305.7224, 1437.5312, 53.140594]
2025-05-02 18:01:38,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [153.0, 415.0, 52.0, 134.0, 41.0, 378.0, 191.0, 409.0, 436.0, 36.0]
2025-05-02 18:01:38,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 69/100 (estimated time remaining: 10 hours, 57 minutes, 50 seconds)
2025-05-02 18:17:09,328 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 18:17:09,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 18:20:27,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1284.65271 ± 900.629
2025-05-02 18:20:27,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [403.58334, 3231.9849, 701.3666, 606.528, 961.712, 2048.1208, 1457.3657, 1376.5543, 58.21935, 2001.0913]
2025-05-02 18:20:27,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [161.0, 1000.0, 252.0, 232.0, 312.0, 647.0, 448.0, 402.0, 39.0, 599.0]
2025-05-02 18:20:27,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 70/100 (estimated time remaining: 10 hours, 5 minutes, 23 seconds)
2025-05-02 18:36:49,464 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 18:36:49,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 18:40:21,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1349.71423 ± 739.036
2025-05-02 18:40:21,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1416.9833, 1514.1501, 410.797, 1617.6, 998.0153, 1483.7367, 3286.158, 769.0296, 1045.3896, 955.2828]
2025-05-02 18:40:21,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [432.0, 454.0, 161.0, 492.0, 332.0, 481.0, 1000.0, 260.0, 314.0, 322.0]
2025-05-02 18:40:21,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 71/100 (estimated time remaining: 9 hours, 40 minutes, 18 seconds)
2025-05-02 18:57:21,310 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 18:57:21,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 19:00:22,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1193.88721 ± 942.554
2025-05-02 19:00:22,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [93.18258, 542.07715, 427.81705, 1112.8358, 3193.3289, 122.52486, 899.81696, 1772.517, 2137.8677, 1636.9039]
2025-05-02 19:00:22,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [55.0, 214.0, 187.0, 322.0, 1000.0, 67.0, 312.0, 559.0, 680.0, 483.0]
2025-05-02 19:00:22,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 72/100 (estimated time remaining: 9 hours, 25 minutes, 14 seconds)
2025-05-02 19:18:08,556 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 19:18:08,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 19:21:01,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1148.14062 ± 1023.649
2025-05-02 19:21:01,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1398.7892, 115.284874, 294.07538, 369.0287, 1343.6031, 2126.5996, 72.35703, 369.8291, 3176.863, 2214.9768]
2025-05-02 19:21:01,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [417.0, 65.0, 122.0, 150.0, 402.0, 704.0, 56.0, 161.0, 1000.0, 696.0]
2025-05-02 19:21:01,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 73/100 (estimated time remaining: 9 hours, 11 minutes, 42 seconds)
2025-05-02 19:37:43,940 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 19:37:43,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 19:39:52,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 856.06281 ± 492.955
2025-05-02 19:39:52,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1154.3575, 980.7826, 1440.4506, 109.93334, 495.75906, 825.3052, 1425.8722, 1473.4639, 453.02844, 201.67494]
2025-05-02 19:39:52,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [344.0, 326.0, 474.0, 72.0, 182.0, 293.0, 424.0, 479.0, 173.0, 93.0]
2025-05-02 19:39:52,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 74/100 (estimated time remaining: 8 hours, 50 minutes, 27 seconds)
2025-05-02 19:55:46,338 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 19:55:46,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 19:57:08,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 524.71155 ± 579.563
2025-05-02 19:57:08,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [192.31877, 1851.6774, 468.85403, 61.406105, 345.30298, 163.8825, 157.30841, 44.487377, 1384.401, 577.4767]
2025-05-02 19:57:08,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [88.0, 566.0, 178.0, 51.0, 153.0, 83.0, 77.0, 40.0, 410.0, 210.0]
2025-05-02 19:57:08,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 75/100 (estimated time remaining: 8 hours, 22 minutes, 44 seconds)
2025-05-02 20:14:32,784 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 20:14:32,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 20:17:50,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1298.28992 ± 846.004
2025-05-02 20:17:50,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [3351.1838, 1330.5464, 326.70712, 151.508, 713.46606, 1854.7722, 1311.7657, 1334.4053, 1425.6935, 1182.8502]
2025-05-02 20:17:50,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [1000.0, 391.0, 133.0, 77.0, 263.0, 555.0, 389.0, 414.0, 436.0, 401.0]
2025-05-02 20:17:50,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 76/100 (estimated time remaining: 8 hours, 7 minutes, 27 seconds)
2025-05-02 20:33:25,269 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 20:33:25,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 20:35:49,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 826.43585 ± 726.272
2025-05-02 20:35:49,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1400.6366, 815.4772, 668.5976, 1612.3839, 2438.9988, 230.43271, 39.208984, 164.60904, 469.39645, 424.6175]
2025-05-02 20:35:49,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [473.0, 262.0, 237.0, 487.0, 728.0, 103.0, 37.0, 83.0, 190.0, 164.0]
2025-05-02 20:35:49,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 77/100 (estimated time remaining: 7 hours, 38 minutes, 8 seconds)
2025-05-02 20:52:52,597 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 20:52:52,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 20:55:35,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1037.66467 ± 923.250
2025-05-02 20:55:35,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [55.737915, 576.3777, 1517.8785, 2893.1885, 548.47424, 1635.7966, 43.059483, 910.92694, 2113.4316, 81.77533]
2025-05-02 20:55:35,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [47.0, 212.0, 447.0, 906.0, 196.0, 534.0, 41.0, 276.0, 622.0, 51.0]
2025-05-02 20:55:35,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 78/100 (estimated time remaining: 7 hours, 15 minutes, 1 second)
2025-05-02 21:12:59,342 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 21:12:59,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 21:15:37,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1125.91394 ± 561.411
2025-05-02 21:15:37,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [724.5841, 1654.4825, 1103.1472, 1584.9929, 2047.7511, 1491.9893, 478.214, 918.20984, 99.87923, 1155.8896]
2025-05-02 21:15:37,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [249.0, 506.0, 327.0, 463.0, 606.0, 465.0, 190.0, 289.0, 57.0, 348.0]
2025-05-02 21:15:37,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 79/100 (estimated time remaining: 7 hours, 1 minute, 18 seconds)
2025-05-02 21:32:15,927 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 21:32:15,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 21:35:33,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1355.09363 ± 808.774
2025-05-02 21:35:33,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1182.8636, 147.01717, 1177.9064, 2928.9707, 1106.4713, 2165.0986, 323.62683, 2186.0396, 1159.6538, 1173.288]
2025-05-02 21:35:33,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [366.0, 74.0, 359.0, 864.0, 329.0, 643.0, 130.0, 655.0, 343.0, 349.0]
2025-05-02 21:35:33,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 80/100 (estimated time remaining: 6 hours, 53 minutes, 21 seconds)
2025-05-02 21:52:32,404 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 21:52:32,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 21:55:25,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1120.93408 ± 861.220
2025-05-02 21:55:25,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [3006.113, 179.67291, 208.03928, 1065.6256, 710.2334, 313.5904, 2014.757, 677.5479, 1442.2126, 1591.5486]
2025-05-02 21:55:25,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [948.0, 84.0, 94.0, 366.0, 261.0, 131.0, 653.0, 247.0, 449.0, 516.0]
2025-05-02 21:55:25,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 81/100 (estimated time remaining: 6 hours, 30 minutes, 21 seconds)
2025-05-02 22:12:43,071 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 22:12:43,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 22:15:14,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 949.75604 ± 890.281
2025-05-02 22:15:14,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [510.24835, 1239.4426, 51.708267, 32.923557, 641.13275, 605.7602, 2027.2327, 853.3059, 3048.388, 487.41757]
2025-05-02 22:15:14,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [209.0, 368.0, 37.0, 45.0, 227.0, 241.0, 644.0, 264.0, 955.0, 190.0]
2025-05-02 22:15:14,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 82/100 (estimated time remaining: 6 hours, 17 minutes, 46 seconds)
2025-05-02 22:33:25,968 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 22:33:25,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 22:37:25,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1538.90039 ± 1201.581
2025-05-02 22:37:25,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1082.5966, 59.40936, 3128.0486, 88.25282, 2257.7932, 2679.3901, 3188.7517, 339.47702, 475.92133, 2089.363]
2025-05-02 22:37:25,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [365.0, 54.0, 1000.0, 50.0, 720.0, 815.0, 1000.0, 135.0, 192.0, 661.0]
2025-05-02 22:37:25,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 83/100 (estimated time remaining: 6 hours, 6 minutes, 37 seconds)
2025-05-02 22:53:38,492 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 22:53:38,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 22:56:33,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1022.52423 ± 909.323
2025-05-02 22:56:33,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1497.6351, 133.20984, 828.3414, 1713.1976, 923.34674, 68.143074, 3237.8018, 1108.821, 265.41873, 449.3276]
2025-05-02 22:56:33,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [451.0, 70.0, 290.0, 539.0, 326.0, 51.0, 1000.0, 332.0, 113.0, 192.0]
2025-05-02 22:56:33,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 84/100 (estimated time remaining: 5 hours, 43 minutes, 11 seconds)
2025-05-02 23:12:52,306 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 23:12:52,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 23:16:11,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1340.96301 ± 1004.987
2025-05-02 23:16:11,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [2020.2684, 902.262, 2861.3513, 541.8866, 1177.9688, 1596.5005, 361.86728, 295.91248, 432.39185, 3219.2207]
2025-05-02 23:16:11,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [604.0, 305.0, 853.0, 205.0, 398.0, 528.0, 161.0, 122.0, 155.0, 1000.0]
2025-05-02 23:16:11,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 85/100 (estimated time remaining: 5 hours, 22 minutes)
2025-05-02 23:32:38,006 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 23:32:38,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 23:34:53,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 863.90247 ± 541.577
2025-05-02 23:34:53,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1335.5431, 1800.6511, 813.53033, 1133.5708, 1285.7845, 897.0979, 373.96707, 905.45996, 20.309774, 73.11047]
2025-05-02 23:34:53,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [384.0, 584.0, 278.0, 393.0, 414.0, 320.0, 155.0, 309.0, 25.0, 57.0]
2025-05-02 23:34:53,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 58 minutes, 22 seconds)
2025-05-02 23:52:06,910 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-02 23:52:06,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-02 23:55:34,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1456.52856 ± 1215.432
2025-05-02 23:55:34,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [345.80576, 2685.0542, 322.4647, 47.008778, 3213.2566, 983.5374, 584.44727, 2410.1387, 3254.6294, 718.94293]
2025-05-02 23:55:34,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [146.0, 812.0, 130.0, 49.0, 1000.0, 296.0, 213.0, 729.0, 961.0, 252.0]
2025-05-02 23:55:34,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 87/100 (estimated time remaining: 4 hours, 40 minutes, 57 seconds)
2025-05-03 00:11:46,442 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 00:11:46,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 00:15:28,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1584.98669 ± 1035.844
2025-05-03 00:15:28,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1894.8877, 1345.487, 2757.4783, 555.3376, 3168.2893, 2599.792, 52.043694, 1709.0125, 1676.2461, 91.293434]
2025-05-03 00:15:28,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [571.0, 397.0, 873.0, 199.0, 1000.0, 815.0, 53.0, 511.0, 531.0, 61.0]
2025-05-03 00:15:28,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 88/100 (estimated time remaining: 4 hours, 14 minutes, 54 seconds)
2025-05-03 00:32:45,827 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 00:32:45,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 00:36:48,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1738.55884 ± 992.533
2025-05-03 00:36:48,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [3212.539, 769.09375, 1569.0996, 3202.9521, 1668.1196, 773.98096, 2823.1177, 1205.1165, 273.30566, 1888.2618]
2025-05-03 00:36:48,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [1000.0, 270.0, 511.0, 1000.0, 489.0, 267.0, 899.0, 401.0, 131.0, 622.0]
2025-05-03 00:36:48,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 89/100 (estimated time remaining: 4 hours, 36 seconds)
2025-05-03 00:54:08,693 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 00:54:08,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 00:57:08,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1145.57239 ± 965.803
2025-05-03 00:57:08,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [280.84396, 2796.162, 661.1691, 392.8976, 678.14594, 1019.93414, 507.9775, 3183.3923, 1281.8661, 653.3339]
2025-05-03 00:57:08,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [130.0, 884.0, 235.0, 171.0, 241.0, 327.0, 203.0, 1000.0, 423.0, 237.0]
2025-05-03 00:57:08,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 42 minutes, 6 seconds)
2025-05-03 01:12:46,731 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 01:12:46,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 01:17:00,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1453.03430 ± 1105.258
2025-05-03 01:17:00,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [2746.3752, 152.90729, 1185.7328, 2735.089, 84.24621, 1968.1553, 1629.2147, 543.88446, 332.78976, 3151.947]
2025-05-03 01:17:00,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [878.0, 76.0, 394.0, 869.0, 63.0, 655.0, 547.0, 200.0, 153.0, 1000.0]
2025-05-03 01:17:00,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 91/100 (estimated time remaining: 3 hours, 24 minutes, 13 seconds)
2025-05-03 01:34:14,526 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 01:34:14,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 01:36:50,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 962.46631 ± 736.083
2025-05-03 01:36:50,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [402.21014, 2254.3992, 185.19481, 1112.232, 966.00397, 33.702602, 367.09512, 2147.2122, 794.50995, 1362.1039]
2025-05-03 01:36:50,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [165.0, 668.0, 86.0, 367.0, 328.0, 32.0, 159.0, 630.0, 265.0, 439.0]
2025-05-03 01:36:50,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 92/100 (estimated time remaining: 3 hours, 2 minutes, 15 seconds)
2025-05-03 01:54:04,987 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 01:54:04,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 01:56:36,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 949.94202 ± 827.450
2025-05-03 01:56:36,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1143.3939, 904.4413, 2145.954, 741.2748, 2801.2336, 233.6519, 629.7724, 383.04175, 179.74445, 336.91174]
2025-05-03 01:56:36,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [394.0, 310.0, 683.0, 269.0, 887.0, 120.0, 226.0, 176.0, 93.0, 136.0]
2025-05-03 01:56:36,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 41 minutes, 48 seconds)
2025-05-03 02:14:53,209 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 02:14:53,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 02:18:20,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1470.09521 ± 768.374
2025-05-03 02:18:20,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1902.5233, 1206.6273, 1537.3932, 293.06412, 2042.0566, 840.2736, 848.00433, 1232.7069, 1573.2598, 3225.0427]
2025-05-03 02:18:20,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [615.0, 407.0, 449.0, 121.0, 615.0, 275.0, 274.0, 364.0, 461.0, 1000.0]
2025-05-03 02:18:20,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 94/100 (estimated time remaining: 2 hours, 22 minutes, 7 seconds)
2025-05-03 02:35:32,516 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 02:35:32,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 02:38:31,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1229.76196 ± 1202.367
2025-05-03 02:38:31,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [274.22418, 2334.0757, 3278.3613, 149.6757, 3226.8623, 1048.3639, 218.09946, 1225.3951, 72.79324, 469.76923]
2025-05-03 02:38:31,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [115.0, 715.0, 1000.0, 81.0, 1000.0, 347.0, 98.0, 412.0, 61.0, 189.0]
2025-05-03 02:38:31,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 95/100 (estimated time remaining: 2 hours, 1 minute, 38 seconds)
2025-05-03 02:53:09,322 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 02:53:09,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 02:55:59,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1188.28882 ± 1082.169
2025-05-03 02:55:59,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [334.99844, 714.7763, 473.17227, 1288.8975, 2364.3809, 2678.833, 427.364, 373.88477, 27.630144, 3198.9514]
2025-05-03 02:55:59,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [152.0, 247.0, 177.0, 439.0, 693.0, 833.0, 167.0, 166.0, 32.0, 1000.0]
2025-05-03 02:55:59,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 38 minutes, 59 seconds)
2025-05-03 03:10:29,683 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 03:10:29,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 03:14:53,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1968.34961 ± 1114.995
2025-05-03 03:14:53,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [3197.8276, 2054.4578, 2440.562, 358.9728, 55.138638, 3205.0513, 2608.6577, 1139.2815, 1423.8009, 3199.7468]
2025-05-03 03:14:53,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [1000.0, 609.0, 746.0, 163.0, 52.0, 1000.0, 786.0, 343.0, 439.0, 993.0]
2025-05-03 03:14:53,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 18 minutes, 26 seconds)
2025-05-03 03:28:19,693 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 03:28:19,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 03:30:38,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 936.19159 ± 559.470
2025-05-03 03:30:38,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [532.7472, 237.06157, 1428.562, 2016.6974, 1099.0144, 87.934044, 1233.2501, 856.8876, 1272.526, 597.23584]
2025-05-03 03:30:38,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [208.0, 122.0, 475.0, 661.0, 359.0, 70.0, 409.0, 300.0, 378.0, 227.0]
2025-05-03 03:30:38,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 98/100 (estimated time remaining: 56 minutes, 25 seconds)
2025-05-03 03:44:43,556 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 03:44:43,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 03:47:03,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 1248.35486 ± 1218.723
2025-05-03 03:47:03,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [211.57362, 291.7676, 3128.523, 795.6482, 44.608265, 2898.3918, 3138.451, 628.5656, 1080.8221, 265.1971]
2025-05-03 03:47:03,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [94.0, 122.0, 1000.0, 268.0, 45.0, 940.0, 1000.0, 222.0, 366.0, 130.0]
2025-05-03 03:47:03,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 99/100 (estimated time remaining: 35 minutes, 29 seconds)
2025-05-03 03:57:25,116 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 03:57:25,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 03:58:39,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 938.48010 ± 560.646
2025-05-03 03:58:39,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [411.74466, 1121.4293, 264.87878, 332.65427, 929.29767, 276.98123, 1405.1049, 1162.4885, 1638.2688, 1841.9534]
2025-05-03 03:58:39,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [159.0, 358.0, 113.0, 135.0, 284.0, 115.0, 419.0, 348.0, 498.0, 549.0]
2025-05-03 03:58:39,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1097 [INFO]: Iteration 100/100 (estimated time remaining: 16 minutes, 1 second)
2025-05-03 04:08:35,543 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-03 04:08:35,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-03 04:11:20,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1119 [DEBUG]: Total Reward: 2094.64893 ± 944.201
2025-05-03 04:11:20,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1120 [DEBUG]: All rewards: [1189.2632, 1157.4109, 2006.1288, 3055.4949, 3202.9688, 2073.7368, 584.97894, 1369.1263, 3088.4487, 3218.9336]
2025-05-03 04:11:20,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1121 [DEBUG]: All trajectory lengths: [366.0, 384.0, 647.0, 979.0, 1000.0, 642.0, 208.0, 464.0, 926.0, 1000.0]
2025-05-03 04:11:20,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1124 [INFO]: New best (2094.65) for latency ExtremeClogL1U23
2025-05-03 04:11:20,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1127 [INFO]: saving network
2025-05-03 04:11:20,093 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-hopper/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-03 04:11:20,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1149 [DEBUG]: Training session finished
