2025-09-13 03:35:21,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc25-hopper/ExtremeSparseL4U32-mbpac_memdelay
2025-09-13 03:35:21,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc25-hopper/ExtremeSparseL4U32-mbpac_memdelay
2025-09-13 03:35:21,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1524cbb89550>}
2025-09-13 03:35:21,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1111 [DEBUG]: using device: cuda
2025-09-13 03:35:21,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1133 [INFO]: Creating new trainer
2025-09-13 03:35:22,057 baseline-mbpac-noiseperc25-hopper:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-09-13 03:35:22,057 baseline-mbpac-noiseperc25-hopper:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-13 03:35:22,064 baseline-mbpac-noiseperc25-hopper:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=3, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-13 03:35:24,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1194 [DEBUG]: Starting training session...
2025-09-13 03:35:24,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 1/100
2025-09-13 03:46:06,071 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:46:06,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:46:15,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 45.13542 ± 21.823
2025-09-13 03:46:15,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [44.591705, 50.460407, 47.370953, 46.104416, 50.802097, 41.947067, 11.140749, 10.493806, 92.07966, 56.363384]
2025-09-13 03:46:15,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 29.0, 28.0, 27.0, 30.0, 25.0, 18.0, 27.0, 58.0, 32.0]
2025-09-13 03:46:15,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (45.14) for latency ExtremeSparseL4U32
2025-09-13 03:46:15,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 17 hours, 54 minutes, 18 seconds)
2025-09-13 03:56:39,791 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:56:39,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:56:57,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 81.31805 ± 59.311
2025-09-13 03:56:57,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [12.0727625, 17.836727, 81.10953, 23.733185, 61.87269, 164.39615, 68.640335, 81.72758, 96.517845, 205.27377]
2025-09-13 03:56:57,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 19.0, 62.0, 30.0, 63.0, 98.0, 53.0, 50.0, 83.0, 112.0]
2025-09-13 03:56:57,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (81.32) for latency ExtremeSparseL4U32
2025-09-13 03:56:57,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 17 hours, 36 minutes, 22 seconds)
2025-09-13 04:07:17,718 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:07:17,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:07:41,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 128.92990 ± 138.852
2025-09-13 04:07:41,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [80.21346, 510.32745, 44.875423, 61.607822, 206.93506, 129.39005, 37.21861, 146.13847, 62.8387, 9.753954]
2025-09-13 04:07:41,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [53.0, 313.0, 32.0, 46.0, 107.0, 82.0, 29.0, 80.0, 45.0, 12.0]
2025-09-13 04:07:41,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (128.93) for latency ExtremeSparseL4U32
2025-09-13 04:07:41,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 17 hours, 24 minutes, 2 seconds)
2025-09-13 04:18:02,839 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:18:02,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:18:23,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 121.68504 ± 77.056
2025-09-13 04:18:23,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [13.436046, 14.553566, 145.52892, 259.943, 116.1304, 140.36917, 84.25983, 72.274025, 231.07454, 139.28073]
2025-09-13 04:18:23,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [27.0, 21.0, 97.0, 126.0, 61.0, 72.0, 50.0, 56.0, 95.0, 70.0]
2025-09-13 04:18:23,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 17 hours, 11 minutes, 36 seconds)
2025-09-13 04:28:44,849 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:28:44,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:29:09,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 138.45535 ± 36.126
2025-09-13 04:29:09,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [97.113205, 205.18166, 88.39479, 127.83782, 189.30017, 128.66333, 104.29942, 150.13861, 155.36902, 138.25562]
2025-09-13 04:29:09,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [62.0, 121.0, 53.0, 80.0, 104.0, 94.0, 74.0, 81.0, 82.0, 72.0]
2025-09-13 04:29:09,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (138.46) for latency ExtremeSparseL4U32
2025-09-13 04:29:09,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 17 hours, 1 minute, 18 seconds)
2025-09-13 04:39:33,075 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:39:33,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:39:56,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 152.81573 ± 76.249
2025-09-13 04:39:56,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [164.83102, 148.01817, 288.07172, 162.49065, 248.12129, 155.96815, 61.00165, 24.053549, 187.9496, 87.651566]
2025-09-13 04:39:56,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [86.0, 74.0, 138.0, 79.0, 115.0, 92.0, 39.0, 21.0, 86.0, 53.0]
2025-09-13 04:39:56,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (152.82) for latency ExtremeSparseL4U32
2025-09-13 04:39:56,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 16 hours, 49 minutes, 24 seconds)
2025-09-13 04:50:21,271 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:50:21,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:50:37,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 96.93754 ± 113.871
2025-09-13 04:50:37,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [236.97, 60.53156, 8.4847355, 20.46452, 15.776888, 20.623785, 15.575501, 22.271635, 294.74167, 273.93506]
2025-09-13 04:50:37,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [101.0, 36.0, 12.0, 32.0, 18.0, 20.0, 23.0, 21.0, 160.0, 112.0]
2025-09-13 04:50:37,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 16 hours, 38 minutes, 9 seconds)
2025-09-13 05:01:10,077 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:01:10,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:01:39,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 196.88815 ± 70.527
2025-09-13 05:01:39,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [216.24426, 227.98291, 266.43137, 116.79997, 185.1672, 234.43073, 236.27283, 268.93576, 189.44579, 27.170603]
2025-09-13 05:01:39,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [114.0, 100.0, 123.0, 67.0, 94.0, 107.0, 113.0, 123.0, 109.0, 28.0]
2025-09-13 05:01:39,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (196.89) for latency ExtremeSparseL4U32
2025-09-13 05:01:39,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 16 hours, 32 minutes, 59 seconds)
2025-09-13 05:11:57,645 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:11:57,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:12:26,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 212.38643 ± 118.603
2025-09-13 05:12:26,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [195.22325, 259.71533, 105.05798, 248.29471, 21.440248, 10.372988, 320.1326, 344.9899, 315.17612, 303.46106]
2025-09-13 05:12:26,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 120.0, 59.0, 119.0, 29.0, 13.0, 133.0, 135.0, 133.0, 137.0]
2025-09-13 05:12:26,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (212.39) for latency ExtremeSparseL4U32
2025-09-13 05:12:26,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 16 hours, 23 minutes, 49 seconds)
2025-09-13 05:22:50,872 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:22:50,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:23:17,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 199.28175 ± 87.037
2025-09-13 05:23:17,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [256.8586, 112.8567, 159.48907, 392.15604, 131.4104, 316.4899, 158.89566, 144.8147, 168.45433, 151.39192]
2025-09-13 05:23:17,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [105.0, 61.0, 77.0, 149.0, 66.0, 123.0, 78.0, 78.0, 78.0, 92.0]
2025-09-13 05:23:17,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 16 hours, 14 minutes, 33 seconds)
2025-09-13 05:33:47,972 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:33:47,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:34:13,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 158.71066 ± 173.189
2025-09-13 05:34:13,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [103.151375, 223.81802, 642.2437, 168.81018, 63.17844, 159.27101, 123.21582, 71.16107, 16.470148, 15.787018]
2025-09-13 05:34:13,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [56.0, 113.0, 306.0, 91.0, 38.0, 110.0, 64.0, 45.0, 20.0, 20.0]
2025-09-13 05:34:13,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 16 hours, 6 minutes, 12 seconds)
2025-09-13 05:44:33,584 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:44:33,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:45:03,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 233.44064 ± 177.760
2025-09-13 05:45:03,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [86.82076, 228.49039, 532.4156, 73.36475, 72.467995, 361.2455, 65.02697, 401.50446, 57.120956, 455.94907]
2025-09-13 05:45:03,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [50.0, 101.0, 201.0, 46.0, 45.0, 149.0, 51.0, 157.0, 38.0, 173.0]
2025-09-13 05:45:03,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (233.44) for latency ExtremeSparseL4U32
2025-09-13 05:45:03,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 15 hours, 58 minutes, 2 seconds)
2025-09-13 05:55:30,002 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:55:30,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:56:18,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 390.73358 ± 231.957
2025-09-13 05:56:18,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [437.59485, 264.50546, 785.05066, 409.14044, 22.552048, 176.26337, 636.9085, 350.2427, 656.8125, 168.26524]
2025-09-13 05:56:18,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [153.0, 129.0, 276.0, 167.0, 52.0, 151.0, 217.0, 153.0, 245.0, 101.0]
2025-09-13 05:56:18,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (390.73) for latency ExtremeSparseL4U32
2025-09-13 05:56:18,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 15 hours, 51 minutes, 3 seconds)
2025-09-13 06:06:44,490 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:06:44,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:07:00,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 85.22295 ± 78.893
2025-09-13 06:07:00,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [181.33307, 227.7641, 8.098551, 27.21424, 10.533271, 144.69853, 152.8386, 65.71464, 16.718193, 17.316332]
2025-09-13 06:07:00,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [84.0, 130.0, 10.0, 24.0, 31.0, 83.0, 84.0, 40.0, 20.0, 29.0]
2025-09-13 06:07:00,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 15 hours, 38 minutes, 34 seconds)
2025-09-13 06:17:30,641 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:17:30,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:17:58,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 217.65259 ± 233.124
2025-09-13 06:17:58,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [151.3667, 379.44855, 316.85825, 238.0295, 17.794933, 8.690528, 18.44987, 6.970583, 243.06444, 795.85266]
2025-09-13 06:17:58,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [77.0, 157.0, 129.0, 102.0, 22.0, 11.0, 21.0, 14.0, 110.0, 273.0]
2025-09-13 06:17:58,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 15 hours, 29 minutes, 29 seconds)
2025-09-13 06:28:20,727 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:28:20,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:28:53,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 242.26802 ± 157.792
2025-09-13 06:28:53,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [14.259495, 204.75153, 509.7061, 150.64827, 279.22864, 19.396547, 359.85272, 307.5673, 143.30151, 433.96808]
2025-09-13 06:28:53,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 104.0, 195.0, 76.0, 114.0, 27.0, 164.0, 144.0, 85.0, 160.0]
2025-09-13 06:28:53,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 15 hours, 18 minutes, 22 seconds)
2025-09-13 06:39:26,879 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:39:26,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:39:36,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 55.75353 ± 85.300
2025-09-13 06:39:36,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [9.136343, 13.956161, 13.955951, 11.23741, 14.058395, 10.989357, 12.942915, 18.704527, 224.40274, 228.15146]
2025-09-13 06:39:36,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [11.0, 18.0, 19.0, 14.0, 21.0, 18.0, 15.0, 18.0, 108.0, 97.0]
2025-09-13 06:39:37,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 15 hours, 5 minutes, 40 seconds)
2025-09-13 06:49:51,932 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:49:51,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:50:17,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 180.64072 ± 167.230
2025-09-13 06:50:17,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [97.49618, 180.2632, 79.525795, 12.048897, 511.92993, 114.15164, 186.60909, 14.772069, 129.38687, 480.22363]
2025-09-13 06:50:17,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [55.0, 82.0, 55.0, 14.0, 198.0, 66.0, 89.0, 21.0, 93.0, 163.0]
2025-09-13 06:50:17,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 14 hours, 45 minutes, 5 seconds)
2025-09-13 07:00:51,891 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:00:51,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:01:12,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 125.73613 ± 89.057
2025-09-13 07:01:12,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [169.59178, 223.76585, 11.752673, 222.495, 94.83148, 252.80492, 162.51096, 96.39272, 6.1697116, 17.0463]
2025-09-13 07:01:12,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [104.0, 113.0, 13.0, 98.0, 54.0, 141.0, 91.0, 56.0, 15.0, 19.0]
2025-09-13 07:01:12,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 14 hours, 38 minutes, 10 seconds)
2025-09-13 07:11:34,473 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:11:34,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:11:57,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 156.93564 ± 137.586
2025-09-13 07:11:57,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [314.2292, 6.361343, 8.815997, 16.571022, 91.95856, 27.405565, 304.0463, 383.20624, 184.81693, 231.94524]
2025-09-13 07:11:57,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [137.0, 9.0, 17.0, 26.0, 55.0, 28.0, 128.0, 150.0, 86.0, 121.0]
2025-09-13 07:11:57,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 14 hours, 23 minutes, 43 seconds)
2025-09-13 07:22:20,011 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:22:20,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:23:09,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 418.40414 ± 170.915
2025-09-13 07:23:09,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [429.85358, 685.9804, 315.89023, 507.57117, 275.774, 186.79591, 288.63144, 285.72748, 714.9923, 492.82498]
2025-09-13 07:23:09,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [182.0, 268.0, 124.0, 193.0, 113.0, 91.0, 149.0, 117.0, 230.0, 173.0]
2025-09-13 07:23:09,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (418.40) for latency ExtremeSparseL4U32
2025-09-13 07:23:09,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 14 hours, 17 minutes, 26 seconds)
2025-09-13 07:33:36,903 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:33:36,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:33:50,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 83.58740 ± 88.646
2025-09-13 07:33:50,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [265.5888, 9.654934, 12.042076, 188.3732, 155.33134, 43.677094, 125.39897, 13.223928, 11.108622, 11.474982]
2025-09-13 07:33:50,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [113.0, 13.0, 16.0, 106.0, 75.0, 32.0, 67.0, 14.0, 16.0, 13.0]
2025-09-13 07:33:50,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 14 hours, 5 minutes, 57 seconds)
2025-09-13 07:44:16,327 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:44:16,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:44:46,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 234.33102 ± 121.146
2025-09-13 07:44:46,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [155.0956, 73.417854, 417.30807, 272.45322, 262.05286, 161.59128, 130.65634, 478.92657, 201.85728, 189.95126]
2025-09-13 07:44:46,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [77.0, 44.0, 174.0, 119.0, 108.0, 78.0, 68.0, 160.0, 103.0, 101.0]
2025-09-13 07:44:46,928 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 13 hours, 59 minutes, 14 seconds)
2025-09-13 07:55:21,161 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:55:21,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:56:05,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 362.57236 ± 185.736
2025-09-13 07:56:05,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [148.1461, 117.980064, 586.0895, 579.27985, 471.1584, 405.01126, 117.32972, 437.51047, 205.2936, 557.9245]
2025-09-13 07:56:05,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [71.0, 63.0, 250.0, 223.0, 183.0, 150.0, 61.0, 167.0, 90.0, 202.0]
2025-09-13 07:56:05,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 13 hours, 54 minutes, 3 seconds)
2025-09-13 08:06:30,439 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:06:30,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:07:08,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 314.82941 ± 255.361
2025-09-13 08:07:08,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [624.3577, 258.74512, 18.953304, 14.187417, 12.4865465, 305.89597, 116.1268, 575.8354, 590.1865, 631.5189]
2025-09-13 08:07:08,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [226.0, 112.0, 19.0, 17.0, 15.0, 134.0, 62.0, 234.0, 222.0, 210.0]
2025-09-13 08:07:08,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 13 hours, 47 minutes, 41 seconds)
2025-09-13 08:17:28,822 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:17:28,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:18:17,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 365.86795 ± 214.531
2025-09-13 08:18:17,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [163.19243, 98.75767, 110.66563, 308.80283, 655.11865, 562.3247, 337.30087, 587.53125, 643.19543, 191.79036]
2025-09-13 08:18:17,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [82.0, 62.0, 82.0, 148.0, 248.0, 247.0, 158.0, 244.0, 246.0, 99.0]
2025-09-13 08:18:17,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 13 hours, 35 minutes, 50 seconds)
2025-09-13 08:28:56,217 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:28:56,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:29:38,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 333.97205 ± 291.682
2025-09-13 08:29:38,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [489.37952, 66.35588, 270.1714, 252.30612, 619.4945, 1017.283, 263.97385, 30.347761, 309.74554, 20.662916]
2025-09-13 08:29:38,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [195.0, 41.0, 124.0, 107.0, 237.0, 374.0, 116.0, 29.0, 149.0, 28.0]
2025-09-13 08:29:38,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 13 hours, 34 minutes, 36 seconds)
2025-09-13 08:39:59,865 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:39:59,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:40:22,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 136.11020 ± 135.905
2025-09-13 08:40:22,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [126.72197, 486.32083, 234.98163, 57.201637, 16.691477, 88.280205, 92.54407, 201.63086, 26.796333, 29.932972]
2025-09-13 08:40:22,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [66.0, 197.0, 106.0, 47.0, 20.0, 67.0, 51.0, 131.0, 30.0, 28.0]
2025-09-13 08:40:22,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 13 hours, 20 minutes, 28 seconds)
2025-09-13 08:50:38,937 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:50:38,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:51:06,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 192.02400 ± 124.909
2025-09-13 08:51:06,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [358.3522, 356.45847, 178.1075, 129.0505, 137.1796, 125.42836, 148.45422, 398.22122, 74.24483, 14.7431]
2025-09-13 08:51:06,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [145.0, 159.0, 97.0, 77.0, 70.0, 64.0, 72.0, 153.0, 46.0, 18.0]
2025-09-13 08:51:06,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 13 hours, 1 minute, 10 seconds)
2025-09-13 09:01:34,663 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:01:34,673 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:02:13,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 307.75562 ± 187.062
2025-09-13 09:02:13,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [318.15073, 256.9456, 743.0436, 309.41907, 209.1503, 469.60788, 17.54865, 117.323784, 294.39413, 341.9726]
2025-09-13 09:02:13,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [141.0, 107.0, 250.0, 128.0, 97.0, 172.0, 20.0, 74.0, 131.0, 162.0]
2025-09-13 09:02:13,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 12 hours, 51 minutes, 16 seconds)
2025-09-13 09:12:38,624 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:12:38,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:13:06,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 207.73531 ± 227.472
2025-09-13 09:13:06,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [176.82188, 55.438293, 12.039896, 729.1158, 226.72946, 398.93167, 27.355598, 22.05559, 413.34167, 15.523055]
2025-09-13 09:13:06,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [81.0, 40.0, 13.0, 287.0, 99.0, 159.0, 29.0, 22.0, 179.0, 20.0]
2025-09-13 09:13:06,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 12 hours, 36 minutes, 39 seconds)
2025-09-13 09:23:41,384 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:23:41,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:24:35,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 468.91791 ± 302.752
2025-09-13 09:24:35,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [98.800606, 236.89081, 812.81866, 349.96893, 633.6765, 658.03925, 62.405098, 227.21353, 1006.283, 603.0827]
2025-09-13 09:24:35,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [54.0, 113.0, 325.0, 140.0, 207.0, 266.0, 38.0, 103.0, 343.0, 218.0]
2025-09-13 09:24:35,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (468.92) for latency ExtremeSparseL4U32
2025-09-13 09:24:35,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 12 hours, 27 minutes, 24 seconds)
2025-09-13 09:34:53,111 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:34:53,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:35:42,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 380.20856 ± 294.779
2025-09-13 09:35:42,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [446.29117, 312.21884, 180.72998, 134.92786, 320.23553, 216.35439, 221.20915, 97.44538, 990.03546, 882.6378]
2025-09-13 09:35:42,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [177.0, 127.0, 86.0, 71.0, 140.0, 97.0, 103.0, 69.0, 439.0, 343.0]
2025-09-13 09:35:42,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 12 hours, 21 minutes, 26 seconds)
2025-09-13 09:46:07,679 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:46:07,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:46:47,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 335.50223 ± 314.693
2025-09-13 09:46:47,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [10.942634, 25.56205, 15.159658, 158.44325, 652.3307, 148.3327, 644.1537, 227.43297, 949.94507, 522.71954]
2025-09-13 09:46:47,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 24.0, 23.0, 78.0, 207.0, 73.0, 275.0, 106.0, 332.0, 175.0]
2025-09-13 09:46:47,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 12 hours, 15 minutes, 11 seconds)
2025-09-13 09:57:15,882 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:57:15,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:57:41,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 170.00652 ± 225.909
2025-09-13 09:57:41,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [17.628311, 22.23042, 8.929274, 10.376171, 136.07271, 20.588617, 782.6075, 212.07906, 211.94908, 277.60406]
2025-09-13 09:57:41,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 26.0, 13.0, 25.0, 75.0, 25.0, 311.0, 94.0, 95.0, 150.0]
2025-09-13 09:57:41,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 12 hours, 1 minute, 1 second)
2025-09-13 10:08:10,731 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:08:10,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:08:51,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 348.13507 ± 231.008
2025-09-13 10:08:51,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [496.12076, 398.3728, 171.9214, 563.2954, 22.132587, 361.76108, 95.30895, 448.4648, 126.19965, 797.77313]
2025-09-13 10:08:51,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [210.0, 164.0, 81.0, 183.0, 20.0, 142.0, 59.0, 172.0, 67.0, 252.0]
2025-09-13 10:08:51,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 11 hours, 53 minutes, 26 seconds)
2025-09-13 10:19:10,870 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:19:10,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:19:41,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 259.78119 ± 270.643
2025-09-13 10:19:41,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [105.48409, 14.337347, 16.97079, 11.310046, 16.213005, 305.20895, 850.31915, 273.9176, 506.64685, 497.40387]
2025-09-13 10:19:41,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [61.0, 18.0, 16.0, 14.0, 17.0, 134.0, 267.0, 118.0, 174.0, 187.0]
2025-09-13 10:19:41,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 11 hours, 34 minutes, 9 seconds)
2025-09-13 10:30:05,750 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:30:05,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:30:46,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 324.72421 ± 352.965
2025-09-13 10:30:46,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [650.5519, 1017.0867, 204.4188, 671.5101, 588.1933, 32.38836, 10.386204, 50.1949, 10.077382, 12.434577]
2025-09-13 10:30:46,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [247.0, 378.0, 91.0, 268.0, 239.0, 30.0, 12.0, 56.0, 15.0, 16.0]
2025-09-13 10:30:46,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 11 hours, 22 minutes, 55 seconds)
2025-09-13 10:41:20,995 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:41:21,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:42:13,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 455.07422 ± 276.304
2025-09-13 10:42:13,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [592.7106, 263.98093, 296.84708, 938.89294, 116.6197, 661.9005, 480.38086, 114.58161, 259.83475, 824.9933]
2025-09-13 10:42:13,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [223.0, 108.0, 138.0, 348.0, 61.0, 216.0, 192.0, 59.0, 107.0, 270.0]
2025-09-13 10:42:13,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 11 hours, 16 minutes, 9 seconds)
2025-09-13 10:52:36,414 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:52:36,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:53:15,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 313.94592 ± 280.471
2025-09-13 10:53:15,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [439.86124, 93.11492, 954.70404, 55.84694, 403.05362, 151.01073, 112.723274, 162.51474, 645.3245, 121.3051]
2025-09-13 10:53:15,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [191.0, 52.0, 347.0, 33.0, 177.0, 75.0, 60.0, 98.0, 213.0, 65.0]
2025-09-13 10:53:15,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 11 hours, 6 minutes, 47 seconds)
2025-09-13 11:03:45,243 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:03:45,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:04:34,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 441.36230 ± 264.945
2025-09-13 11:04:34,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [1041.9427, 136.38918, 420.39023, 577.1856, 683.28296, 267.9233, 376.69095, 291.9847, 507.88995, 109.943436]
2025-09-13 11:04:34,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [360.0, 68.0, 168.0, 210.0, 218.0, 113.0, 152.0, 113.0, 204.0, 58.0]
2025-09-13 11:04:34,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 10 hours, 57 minutes, 32 seconds)
2025-09-13 11:15:00,045 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:15:00,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:15:41,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 365.50577 ± 282.240
2025-09-13 11:15:41,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [104.28463, 281.7147, 691.37604, 601.14343, 332.08044, 118.986084, 251.9221, 12.118781, 961.8299, 299.6016]
2025-09-13 11:15:41,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [58.0, 120.0, 223.0, 227.0, 143.0, 62.0, 112.0, 17.0, 309.0, 124.0]
2025-09-13 11:15:41,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 10 hours, 49 minutes, 42 seconds)
2025-09-13 11:25:58,854 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:25:58,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:26:24,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 191.12593 ± 140.987
2025-09-13 11:26:24,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [129.56107, 19.47651, 8.250516, 86.98501, 464.80478, 338.65704, 107.97788, 298.4376, 269.3876, 187.7215]
2025-09-13 11:26:24,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [65.0, 20.0, 12.0, 51.0, 195.0, 136.0, 60.0, 121.0, 118.0, 87.0]
2025-09-13 11:26:24,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 10 hours, 34 minutes, 17 seconds)
2025-09-13 11:36:49,183 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:36:49,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:37:32,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 337.93481 ± 223.966
2025-09-13 11:37:32,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [184.65009, 763.77594, 89.75454, 177.90707, 554.80066, 209.01195, 386.66583, 28.805233, 508.02383, 475.953]
2025-09-13 11:37:32,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [92.0, 261.0, 54.0, 86.0, 229.0, 100.0, 158.0, 26.0, 243.0, 161.0]
2025-09-13 11:37:32,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 10 hours, 19 minutes, 32 seconds)
2025-09-13 11:47:59,617 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:47:59,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:48:28,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 232.13777 ± 224.947
2025-09-13 11:48:28,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [238.14449, 362.80875, 113.840866, 113.763336, 25.036316, 20.438704, 12.23006, 681.4376, 578.1821, 175.49544]
2025-09-13 11:48:28,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [98.0, 140.0, 68.0, 66.0, 23.0, 18.0, 15.0, 280.0, 186.0, 80.0]
2025-09-13 11:48:28,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 10 hours, 7 minutes, 28 seconds)
2025-09-13 11:59:05,557 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:59:05,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:59:57,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 436.44531 ± 351.916
2025-09-13 11:59:57,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [375.91888, 561.06506, 369.04196, 139.17459, 289.66644, 361.05222, 86.45987, 531.4783, 248.592, 1402.0037]
2025-09-13 11:59:57,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [160.0, 206.0, 149.0, 74.0, 126.0, 144.0, 55.0, 184.0, 110.0, 503.0]
2025-09-13 11:59:57,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 9 hours, 58 minutes, 8 seconds)
2025-09-13 12:10:15,939 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:10:15,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:11:06,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 436.31293 ± 356.445
2025-09-13 12:11:06,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [504.1705, 291.35834, 117.19458, 80.30768, 847.37286, 160.81448, 701.65283, 476.9948, 1162.2279, 21.035679]
2025-09-13 12:11:06,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [174.0, 128.0, 61.0, 47.0, 291.0, 85.0, 281.0, 192.0, 414.0, 24.0]
2025-09-13 12:11:06,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 9 hours, 47 minutes, 21 seconds)
2025-09-13 12:21:30,088 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:21:30,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:22:21,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 469.39313 ± 234.068
2025-09-13 12:22:21,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [275.64133, 707.93115, 839.24146, 304.19012, 510.257, 564.32635, 389.0665, 15.080402, 709.4188, 378.77786]
2025-09-13 12:22:21,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [121.0, 243.0, 261.0, 124.0, 180.0, 212.0, 159.0, 18.0, 242.0, 153.0]
2025-09-13 12:22:21,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (469.39) for latency ExtremeSparseL4U32
2025-09-13 12:22:21,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 9 hours, 41 minutes, 47 seconds)
2025-09-13 12:32:56,665 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:32:56,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:33:21,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 173.04239 ± 164.141
2025-09-13 12:33:21,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [22.011461, 154.19899, 236.77853, 107.02489, 91.74647, 19.137335, 9.977407, 375.46313, 547.12604, 166.95975]
2025-09-13 12:33:21,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 78.0, 106.0, 59.0, 56.0, 18.0, 23.0, 155.0, 200.0, 86.0]
2025-09-13 12:33:21,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 9 hours, 29 minutes, 19 seconds)
2025-09-13 12:43:40,068 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:43:40,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:44:17,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 317.86420 ± 290.263
2025-09-13 12:44:17,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [617.9212, 9.419846, 16.032595, 10.364782, 197.8412, 525.18616, 120.83956, 850.6803, 605.77954, 224.5768]
2025-09-13 12:44:17,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [221.0, 11.0, 21.0, 15.0, 102.0, 178.0, 63.0, 323.0, 194.0, 102.0]
2025-09-13 12:44:17,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 9 hours, 18 minutes, 6 seconds)
2025-09-13 12:54:53,644 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:54:53,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:55:04,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 54.38506 ± 107.584
2025-09-13 12:55:04,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [73.835464, 6.5140076, 11.636672, 8.229452, 372.21725, 19.792719, 9.376703, 13.020453, 14.118686, 15.1091175]
2025-09-13 12:55:04,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [53.0, 10.0, 16.0, 10.0, 159.0, 19.0, 15.0, 20.0, 33.0, 26.0]
2025-09-13 12:55:04,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 9 hours, 6 seconds)
2025-09-13 13:05:22,845 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:05:22,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:06:08,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 388.78485 ± 257.798
2025-09-13 13:06:08,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [835.95917, 674.47943, 465.00427, 16.577223, 555.4626, 63.03664, 232.82431, 546.3914, 340.0428, 158.0709]
2025-09-13 13:06:08,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [282.0, 224.0, 180.0, 17.0, 213.0, 49.0, 103.0, 217.0, 145.0, 79.0]
2025-09-13 13:06:08,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 8 hours, 48 minutes, 17 seconds)
2025-09-13 13:16:36,881 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:16:36,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:17:23,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 424.75024 ± 302.309
2025-09-13 13:17:23,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [106.616806, 793.63983, 555.2224, 581.135, 151.42314, 243.66672, 328.5944, 395.32422, 51.898907, 1039.9811]
2025-09-13 13:17:23,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [57.0, 270.0, 200.0, 192.0, 74.0, 105.0, 145.0, 155.0, 40.0, 334.0]
2025-09-13 13:17:23,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 8 hours, 37 minutes, 19 seconds)
2025-09-13 13:27:50,084 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:27:50,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:28:22,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 249.05252 ± 271.425
2025-09-13 13:28:22,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [13.020756, 81.05938, 196.61542, 77.94563, 906.39435, 546.9751, 342.66748, 61.095554, 12.8284025, 251.92305]
2025-09-13 13:28:22,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 47.0, 105.0, 57.0, 301.0, 202.0, 165.0, 38.0, 28.0, 112.0]
2025-09-13 13:28:22,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 8 hours, 26 minutes, 10 seconds)
2025-09-13 13:38:53,917 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:38:53,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:39:52,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 479.42902 ± 479.850
2025-09-13 13:39:52,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [13.730211, 786.8985, 113.951294, 9.378037, 71.86943, 1515.4427, 630.1115, 565.81506, 967.1075, 119.98591]
2025-09-13 13:39:52,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 323.0, 60.0, 12.0, 51.0, 573.0, 219.0, 217.0, 379.0, 101.0]
2025-09-13 13:39:52,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (479.43) for latency ExtremeSparseL4U32
2025-09-13 13:39:52,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 8 hours, 20 minutes, 14 seconds)
2025-09-13 13:50:21,842 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:50:21,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:50:56,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 283.59845 ± 305.407
2025-09-13 13:50:56,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [574.5184, 204.83794, 12.897831, 18.764427, 30.489765, 376.75043, 244.63135, 135.2608, 1053.2778, 184.55591]
2025-09-13 13:50:56,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [203.0, 92.0, 17.0, 22.0, 27.0, 155.0, 127.0, 71.0, 342.0, 85.0]
2025-09-13 13:50:56,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 8 hours, 11 minutes, 39 seconds)
2025-09-13 14:01:13,648 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:01:13,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:01:46,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 272.61099 ± 173.500
2025-09-13 14:01:46,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [232.00458, 614.8939, 239.10776, 314.18393, 10.161221, 14.785637, 284.96582, 221.95953, 468.91013, 325.13742]
2025-09-13 14:01:46,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [100.0, 232.0, 107.0, 124.0, 12.0, 21.0, 119.0, 99.0, 157.0, 136.0]
2025-09-13 14:01:46,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 7 hours, 58 minutes, 26 seconds)
2025-09-13 14:12:10,807 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:12:10,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:12:57,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 394.45599 ± 295.985
2025-09-13 14:12:57,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [477.84225, 498.52615, 149.48608, 200.50337, 1090.459, 492.4846, 108.17973, 26.940481, 302.7194, 597.4188]
2025-09-13 14:12:57,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [186.0, 184.0, 72.0, 89.0, 414.0, 186.0, 61.0, 25.0, 123.0, 196.0]
2025-09-13 14:12:57,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 7 hours, 46 minutes, 43 seconds)
2025-09-13 14:23:25,747 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:23:25,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:24:28,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 532.97302 ± 630.969
2025-09-13 14:24:28,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [616.464, 8.768115, 1233.6656, 108.99117, 286.8951, 1979.9043, 129.7521, 13.981971, 942.5899, 8.71795]
2025-09-13 14:24:28,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [241.0, 14.0, 465.0, 59.0, 126.0, 713.0, 66.0, 16.0, 349.0, 16.0]
2025-09-13 14:24:28,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1226 [INFO]: New best (532.97) for latency ExtremeSparseL4U32
2025-09-13 14:24:28,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 7 hours, 39 minutes, 59 seconds)
2025-09-13 14:34:54,503 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:34:54,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:35:36,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 345.65732 ± 274.611
2025-09-13 14:35:36,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [386.85992, 225.51622, 460.96146, 86.890976, 12.015189, 330.7084, 417.91318, 181.94618, 1060.7725, 292.98914]
2025-09-13 14:35:36,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [166.0, 109.0, 173.0, 60.0, 14.0, 158.0, 173.0, 93.0, 344.0, 128.0]
2025-09-13 14:35:36,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 7 hours, 25 minutes, 53 seconds)
2025-09-13 14:46:05,698 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:46:05,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:46:24,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 137.73949 ± 175.699
2025-09-13 14:46:24,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [17.302467, 145.63719, 617.47394, 171.48602, 15.260393, 171.68173, 188.69574, 9.177077, 22.389065, 18.29127]
2025-09-13 14:46:24,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [32.0, 71.0, 198.0, 86.0, 20.0, 83.0, 84.0, 12.0, 25.0, 21.0]
2025-09-13 14:46:24,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 7 hours, 12 minutes, 38 seconds)
2025-09-13 14:56:48,402 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:56:48,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:57:25,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 303.29846 ± 287.117
2025-09-13 14:57:25,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [11.940705, 718.93884, 71.672356, 13.324425, 318.3902, 432.67215, 551.19214, 8.82228, 783.93506, 122.09633]
2025-09-13 14:57:25,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 273.0, 44.0, 18.0, 124.0, 179.0, 218.0, 14.0, 291.0, 62.0]
2025-09-13 14:57:25,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 7 hours, 2 minutes, 55 seconds)
2025-09-13 15:07:54,401 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:07:54,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:08:40,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 416.33585 ± 246.279
2025-09-13 15:08:40,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [301.55685, 154.86418, 164.6815, 150.68283, 770.1802, 696.34265, 471.65814, 530.0996, 750.9335, 172.35904]
2025-09-13 15:08:40,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [127.0, 77.0, 82.0, 72.0, 269.0, 214.0, 186.0, 175.0, 237.0, 82.0]
2025-09-13 15:08:40,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 6 hours, 52 minutes, 18 seconds)
2025-09-13 15:19:11,241 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:19:11,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:19:55,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 417.06543 ± 314.727
2025-09-13 15:19:55,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [13.23152, 12.767492, 408.71024, 24.109898, 841.1599, 267.3508, 613.90344, 768.7717, 797.53375, 423.1155]
2025-09-13 15:19:55,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 22.0, 148.0, 27.0, 285.0, 112.0, 194.0, 244.0, 257.0, 172.0]
2025-09-13 15:19:55,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 6 hours, 39 minutes, 19 seconds)
2025-09-13 15:30:19,256 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:30:19,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:31:07,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 425.54990 ± 266.948
2025-09-13 15:31:07,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [875.6118, 240.98709, 593.87964, 557.11224, 83.17063, 19.888565, 170.0375, 570.34906, 670.4122, 474.05045]
2025-09-13 15:31:07,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [308.0, 114.0, 224.0, 195.0, 48.0, 18.0, 78.0, 214.0, 206.0, 177.0]
2025-09-13 15:31:07,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 6 hours, 28 minutes, 37 seconds)
2025-09-13 15:41:35,343 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:41:35,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:42:04,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 224.74072 ± 146.633
2025-09-13 15:42:04,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [11.254031, 160.3437, 227.22365, 31.435143, 231.73097, 110.45477, 455.86652, 456.93536, 257.38898, 304.77417]
2025-09-13 15:42:04,392 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 83.0, 98.0, 31.0, 104.0, 64.0, 179.0, 162.0, 113.0, 132.0]
2025-09-13 15:42:04,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 6 hours, 18 minutes, 31 seconds)
2025-09-13 15:52:31,131 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:52:31,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:53:16,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 379.35489 ± 226.810
2025-09-13 15:53:16,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [460.0926, 212.48062, 243.43268, 386.46625, 168.06657, 17.89725, 502.2007, 315.05157, 719.59235, 768.26825]
2025-09-13 15:53:16,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [185.0, 94.0, 109.0, 156.0, 80.0, 19.0, 251.0, 131.0, 250.0, 252.0]
2025-09-13 15:53:16,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 6 hours, 8 minutes, 38 seconds)
2025-09-13 16:03:38,671 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:03:38,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:04:03,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 196.67142 ± 231.855
2025-09-13 16:04:03,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [19.69022, 13.4289465, 12.997051, 14.975008, 178.24084, 516.5037, 725.7601, 129.14508, 82.87946, 273.0939]
2025-09-13 16:04:03,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 15.0, 31.0, 16.0, 84.0, 188.0, 234.0, 66.0, 57.0, 122.0]
2025-09-13 16:04:03,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 5 hours, 54 minutes, 28 seconds)
2025-09-13 16:14:29,871 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:14:29,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:15:20,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 444.99408 ± 299.831
2025-09-13 16:15:20,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [6.534507, 215.74318, 370.84436, 387.585, 327.5126, 323.50912, 265.12885, 609.2635, 1005.17975, 938.6403]
2025-09-13 16:15:20,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [11.0, 100.0, 151.0, 162.0, 135.0, 136.0, 112.0, 235.0, 342.0, 297.0]
2025-09-13 16:15:20,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 5 hours, 43 minutes, 32 seconds)
2025-09-13 16:25:48,095 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:25:48,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:26:23,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 289.92792 ± 202.826
2025-09-13 16:26:23,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [608.4638, 119.09612, 626.23834, 173.63814, 16.440434, 461.9153, 204.59001, 184.40572, 371.5679, 132.92358]
2025-09-13 16:26:23,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [212.0, 62.0, 202.0, 109.0, 26.0, 174.0, 93.0, 84.0, 145.0, 66.0]
2025-09-13 16:26:23,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 5 hours, 31 minutes, 36 seconds)
2025-09-13 16:36:54,965 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:36:54,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:37:43,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 451.97949 ± 524.370
2025-09-13 16:37:43,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [647.54803, 828.24225, 11.265661, 263.02713, 1781.7004, 295.10434, 69.127655, 16.209515, 15.0588875, 592.5114]
2025-09-13 16:37:43,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [206.0, 266.0, 13.0, 113.0, 584.0, 123.0, 48.0, 27.0, 18.0, 210.0]
2025-09-13 16:37:43,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 5 hours, 22 minutes, 46 seconds)
2025-09-13 16:48:04,735 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:48:04,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:48:42,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 314.18878 ± 170.870
2025-09-13 16:48:42,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [143.13991, 611.78046, 426.48022, 297.3224, 354.45862, 181.73236, 243.95105, 132.0829, 595.1052, 155.83472]
2025-09-13 16:48:42,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [85.0, 233.0, 162.0, 119.0, 141.0, 94.0, 109.0, 69.0, 198.0, 76.0]
2025-09-13 16:48:42,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 5 hours, 10 minutes, 28 seconds)
2025-09-13 16:59:14,992 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:59:14,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:00:02,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 409.65118 ± 266.080
2025-09-13 17:00:02,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [473.69333, 282.97083, 340.06943, 298.65582, 931.45984, 580.1313, 531.1548, 23.541206, 10.443033, 624.39233]
2025-09-13 17:00:02,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [175.0, 123.0, 145.0, 126.0, 309.0, 214.0, 220.0, 31.0, 18.0, 236.0]
2025-09-13 17:00:02,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 2 minutes, 18 seconds)
2025-09-13 17:10:26,137 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:10:26,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:11:07,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 329.95428 ± 335.389
2025-09-13 17:11:07,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [236.31958, 689.9205, 1149.298, 116.41804, 138.89822, 331.1229, 10.851814, 16.929037, 178.35397, 431.431]
2025-09-13 17:11:07,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [103.0, 235.0, 430.0, 64.0, 71.0, 135.0, 17.0, 27.0, 98.0, 179.0]
2025-09-13 17:11:07,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 4 hours, 50 minutes, 5 seconds)
2025-09-13 17:21:29,698 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:21:29,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:21:54,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 196.77666 ± 360.659
2025-09-13 17:21:54,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [726.1178, 1073.7981, 79.15878, 11.060716, 8.749293, 10.185286, 18.826206, 25.216019, 8.54768, 6.1067204]
2025-09-13 17:21:54,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [284.0, 366.0, 53.0, 19.0, 12.0, 20.0, 25.0, 24.0, 18.0, 9.0]
2025-09-13 17:21:54,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 4 hours, 37 minutes, 32 seconds)
2025-09-13 17:32:28,277 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:32:28,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:33:10,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 341.53427 ± 233.594
2025-09-13 17:33:10,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [559.74316, 742.69324, 510.34402, 613.4992, 210.29286, 135.89134, 142.29787, 10.580602, 305.33853, 184.6615]
2025-09-13 17:33:10,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [202.0, 277.0, 199.0, 245.0, 97.0, 73.0, 81.0, 16.0, 140.0, 89.0]
2025-09-13 17:33:10,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 4 hours, 26 minutes, 12 seconds)
2025-09-13 17:43:42,682 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:43:42,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:44:17,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 291.47531 ± 194.588
2025-09-13 17:44:17,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [22.58022, 10.332316, 219.90211, 226.02968, 576.4626, 137.40024, 564.86786, 409.153, 447.21957, 300.80557]
2025-09-13 17:44:17,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 13.0, 103.0, 99.0, 220.0, 71.0, 210.0, 166.0, 164.0, 131.0]
2025-09-13 17:44:18,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 4 hours, 15 minutes, 41 seconds)
2025-09-13 17:54:31,898 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:54:31,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:55:21,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 449.49570 ± 503.601
2025-09-13 17:55:21,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [822.9845, 213.93527, 18.671183, 9.430806, 387.2547, 1720.1069, 743.8837, 348.3933, 222.63174, 7.6649237]
2025-09-13 17:55:21,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [304.0, 98.0, 20.0, 11.0, 158.0, 601.0, 229.0, 147.0, 109.0, 10.0]
2025-09-13 17:55:21,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 3 minutes, 25 seconds)
2025-09-13 18:05:54,840 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:05:54,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:06:14,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 129.52736 ± 146.958
2025-09-13 18:06:14,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [112.2931, 19.824883, 15.42742, 185.27673, 476.24704, 292.9752, 9.318033, 151.17487, 21.375256, 11.360988]
2025-09-13 18:06:14,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [59.0, 19.0, 33.0, 86.0, 184.0, 128.0, 12.0, 73.0, 26.0, 15.0]
2025-09-13 18:06:14,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 3 hours, 51 minutes, 27 seconds)
2025-09-13 18:16:42,589 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:16:42,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:17:17,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 274.18826 ± 318.455
2025-09-13 18:17:17,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [66.28988, 20.141018, 14.520524, 10.243101, 648.8064, 612.97766, 61.631756, 392.10062, 897.93085, 17.240608]
2025-09-13 18:17:17,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [41.0, 23.0, 30.0, 19.0, 254.0, 253.0, 37.0, 164.0, 331.0, 22.0]
2025-09-13 18:17:17,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 3 hours, 41 minutes, 34 seconds)
2025-09-13 18:27:33,447 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:27:33,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:28:12,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 326.72781 ± 309.031
2025-09-13 18:28:12,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [10.7299185, 19.533619, 233.49388, 101.999214, 449.62625, 744.871, 669.8033, 167.44377, 850.3185, 19.458853]
2025-09-13 18:28:12,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 20.0, 101.0, 56.0, 179.0, 285.0, 221.0, 81.0, 337.0, 22.0]
2025-09-13 18:28:12,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 3 hours, 29 minutes, 6 seconds)
2025-09-13 18:38:41,176 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:38:41,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:39:02,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 132.86600 ± 109.328
2025-09-13 18:39:02,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [14.478652, 252.42773, 77.68591, 79.71241, 274.65082, 146.0749, 334.1016, 96.464455, 44.31402, 8.749648]
2025-09-13 18:39:02,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 161.0, 53.0, 46.0, 118.0, 73.0, 139.0, 54.0, 37.0, 18.0]
2025-09-13 18:39:02,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 17 minutes, 5 seconds)
2025-09-13 18:49:29,731 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:49:29,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:49:58,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 220.99358 ± 230.969
2025-09-13 18:49:58,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [295.47314, 305.2276, 17.831116, 771.6669, 413.28064, 18.404806, 9.464546, 11.672376, 242.25787, 124.65678]
2025-09-13 18:49:58,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [125.0, 124.0, 18.0, 279.0, 163.0, 19.0, 14.0, 19.0, 115.0, 64.0]
2025-09-13 18:49:58,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 5 minutes, 39 seconds)
2025-09-13 19:00:23,188 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:00:23,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:00:54,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 241.96875 ± 158.941
2025-09-13 19:00:54,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [11.23341, 255.49467, 560.81287, 188.56282, 172.02649, 133.76324, 211.98737, 447.39642, 88.91244, 349.4977]
2025-09-13 19:00:54,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 138.0, 215.0, 86.0, 79.0, 67.0, 97.0, 166.0, 54.0, 148.0]
2025-09-13 19:00:54,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 2 hours, 54 minutes, 57 seconds)
2025-09-13 19:11:35,685 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:11:35,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:12:07,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 262.82623 ± 172.179
2025-09-13 19:12:07,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [361.32526, 238.95438, 388.4259, 148.91971, 167.52301, 644.15424, 235.7602, 346.81772, 87.035255, 9.346658]
2025-09-13 19:12:07,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [145.0, 102.0, 142.0, 73.0, 82.0, 203.0, 100.0, 140.0, 50.0, 11.0]
2025-09-13 19:12:07,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 2 hours, 44 minutes, 27 seconds)
2025-09-13 19:22:20,594 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:22:20,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:23:03,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 386.71033 ± 343.172
2025-09-13 19:23:03,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [774.3876, 1109.5701, 549.7191, 572.762, 141.7301, 12.833035, 141.61421, 160.33257, 22.68011, 381.47415]
2025-09-13 19:23:03,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [245.0, 379.0, 196.0, 188.0, 74.0, 14.0, 69.0, 80.0, 26.0, 150.0]
2025-09-13 19:23:03,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 2 hours, 33 minutes, 34 seconds)
2025-09-13 19:33:31,041 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:33:31,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:34:11,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 349.38016 ± 288.197
2025-09-13 19:34:11,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [628.8922, 250.40186, 652.3636, 874.54584, 481.0288, 6.504913, 354.62518, 211.82352, 16.094917, 17.52059]
2025-09-13 19:34:11,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [247.0, 106.0, 237.0, 267.0, 171.0, 10.0, 147.0, 99.0, 23.0, 28.0]
2025-09-13 19:34:11,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 23 minutes, 22 seconds)
2025-09-13 19:44:47,043 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:44:47,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:45:16,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 232.89932 ± 210.587
2025-09-13 19:45:16,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [17.752766, 8.775606, 142.15283, 282.2338, 205.5771, 149.28076, 556.20526, 172.40916, 108.94818, 685.65784]
2025-09-13 19:45:16,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 22.0, 77.0, 122.0, 95.0, 72.0, 176.0, 79.0, 58.0, 240.0]
2025-09-13 19:45:16,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 12 minutes, 43 seconds)
2025-09-13 19:55:32,327 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:55:32,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:56:01,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 217.96553 ± 178.889
2025-09-13 19:56:01,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [192.80812, 582.5304, 142.76657, 324.33563, 8.43964, 235.91751, 411.91815, 11.604284, 9.7703705, 259.56464]
2025-09-13 19:56:01,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [90.0, 218.0, 72.0, 144.0, 13.0, 107.0, 169.0, 14.0, 18.0, 130.0]
2025-09-13 19:56:01,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 1 minute, 14 seconds)
2025-09-13 20:06:32,786 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:06:32,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:06:58,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 208.38681 ± 258.919
2025-09-13 20:06:58,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [498.98404, 18.71733, 146.80208, 192.42093, 9.12256, 8.372005, 6.992672, 12.311003, 805.0169, 385.12875]
2025-09-13 20:06:58,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [197.0, 21.0, 76.0, 88.0, 11.0, 15.0, 12.0, 14.0, 273.0, 163.0]
2025-09-13 20:06:58,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 1 hour, 49 minutes, 43 seconds)
2025-09-13 20:17:55,148 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:17:55,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:18:19,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 170.98111 ± 130.579
2025-09-13 20:18:19,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [132.9675, 164.59518, 122.84818, 259.64972, 11.264847, 21.270481, 12.329496, 383.82925, 241.86534, 359.1911]
2025-09-13 20:18:19,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [70.0, 81.0, 66.0, 115.0, 14.0, 25.0, 17.0, 175.0, 116.0, 146.0]
2025-09-13 20:18:19,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 39 minutes, 29 seconds)
2025-09-13 20:28:11,199 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:28:11,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:29:09,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 528.65399 ± 319.287
2025-09-13 20:29:09,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [862.3025, 579.4817, 501.9906, 371.00632, 1141.4387, 6.5246887, 520.3004, 78.686745, 524.99316, 699.8152]
2025-09-13 20:29:09,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [330.0, 231.0, 181.0, 152.0, 402.0, 9.0, 169.0, 47.0, 203.0, 238.0]
2025-09-13 20:29:09,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 27 minutes, 56 seconds)
2025-09-13 20:39:38,219 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:39:38,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:40:24,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 433.97125 ± 369.189
2025-09-13 20:40:24,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [14.032096, 1283.9536, 467.75003, 774.5967, 596.5062, 472.39694, 91.357414, 23.279799, 333.4042, 282.43512]
2025-09-13 20:40:24,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [30.0, 404.0, 177.0, 241.0, 198.0, 181.0, 52.0, 22.0, 149.0, 117.0]
2025-09-13 20:40:24,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 17 minutes, 11 seconds)
2025-09-13 20:50:49,765 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:50:49,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:51:27,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 333.77975 ± 275.279
2025-09-13 20:51:27,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [13.624771, 19.868687, 192.89694, 616.7463, 532.6671, 127.275955, 768.9189, 247.11981, 120.937996, 697.7413]
2025-09-13 20:51:27,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 27.0, 90.0, 230.0, 179.0, 66.0, 261.0, 115.0, 62.0, 217.0]
2025-09-13 20:51:27,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 6 minutes, 31 seconds)
2025-09-13 21:01:55,508 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:01:55,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:02:22,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 228.52243 ± 269.512
2025-09-13 21:02:22,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [12.577377, 19.202738, 209.68732, 25.506958, 358.87482, 37.541874, 226.11617, 13.220435, 873.30786, 509.18878]
2025-09-13 21:02:22,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 20.0, 93.0, 34.0, 138.0, 30.0, 99.0, 19.0, 274.0, 199.0]
2025-09-13 21:02:22,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 55 minutes, 24 seconds)
2025-09-13 21:12:48,634 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:12:48,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:13:32,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 385.49918 ± 383.465
2025-09-13 21:13:32,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [247.6452, 630.34576, 1284.3948, 523.33887, 609.957, 86.65266, 7.243466, 16.327713, 8.189935, 440.8964]
2025-09-13 21:13:32,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [110.0, 202.0, 489.0, 166.0, 221.0, 51.0, 10.0, 16.0, 12.0, 175.0]
2025-09-13 21:13:32,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 44 minutes, 10 seconds)
2025-09-13 21:24:00,815 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:24:00,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:24:47,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 402.42340 ± 231.125
2025-09-13 21:24:47,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [75.15166, 291.9988, 760.1842, 371.18008, 720.78107, 439.38382, 681.4168, 198.74797, 162.52797, 322.8615]
2025-09-13 21:24:47,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [49.0, 121.0, 280.0, 139.0, 234.0, 164.0, 252.0, 90.0, 76.0, 134.0]
2025-09-13 21:24:47,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 33 minutes, 23 seconds)
2025-09-13 21:35:16,770 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:35:16,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:36:10,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 496.22525 ± 339.868
2025-09-13 21:36:10,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [199.48976, 525.09924, 12.119434, 898.0516, 236.56079, 316.8027, 1234.8076, 610.79974, 393.18225, 535.3394]
2025-09-13 21:36:10,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [89.0, 192.0, 13.0, 278.0, 114.0, 124.0, 421.0, 203.0, 154.0, 191.0]
2025-09-13 21:36:10,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 22 minutes, 18 seconds)
2025-09-13 21:46:46,450 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:46:46,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:47:32,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 397.43890 ± 310.422
2025-09-13 21:47:32,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [667.68024, 256.1897, 697.48096, 113.83724, 191.25473, 323.78958, 9.459818, 760.3254, 899.86127, 54.509995]
2025-09-13 21:47:32,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [269.0, 114.0, 250.0, 66.0, 86.0, 138.0, 11.0, 284.0, 286.0, 33.0]
2025-09-13 21:47:32,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 11 minutes, 12 seconds)
2025-09-13 21:57:57,265 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:57:57,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:58:32,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 304.54962 ± 278.903
2025-09-13 21:58:32,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1222 [DEBUG]: All rewards: [8.69064, 6.067534, 235.8003, 900.0441, 464.07452, 55.680923, 14.76113, 441.8241, 502.9561, 415.59683]
2025-09-13 21:58:32,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [13.0, 8.0, 103.0, 287.0, 174.0, 56.0, 16.0, 162.0, 192.0, 168.0]
2025-09-13 21:58:32,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc25-hopper):1251 [DEBUG]: Training session finished
