2025-09-13 02:47:49,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc10-hopper/ExtremeSparseL4U32-mbpac_memdelay
2025-09-13 02:47:49,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc10-hopper/ExtremeSparseL4U32-mbpac_memdelay
2025-09-13 02:47:49,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x151696fbc550>}
2025-09-13 02:47:49,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1111 [DEBUG]: using device: cuda
2025-09-13 02:47:49,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1133 [INFO]: Creating new trainer
2025-09-13 02:47:49,389 baseline-mbpac-noiseperc10-hopper:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-09-13 02:47:49,389 baseline-mbpac-noiseperc10-hopper:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-13 02:47:49,396 baseline-mbpac-noiseperc10-hopper:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=3, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-13 02:47:50,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1194 [DEBUG]: Starting training session...
2025-09-13 02:47:50,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 1/100
2025-09-13 02:59:05,124 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:59:05,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:59:14,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 53.60060 ± 14.933
2025-09-13 02:59:14,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [48.342827, 43.235054, 39.500866, 74.63544, 73.85162, 58.803017, 68.4979, 44.130894, 57.645058, 27.36331]
2025-09-13 02:59:14,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 29.0, 27.0, 41.0, 42.0, 35.0, 39.0, 28.0, 33.0, 20.0]
2025-09-13 02:59:14,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (53.60) for latency ExtremeSparseL4U32
2025-09-13 02:59:14,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 18 hours, 49 minutes, 12 seconds)
2025-09-13 03:10:07,913 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:10:07,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:10:32,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 157.84534 ± 50.781
2025-09-13 03:10:32,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [188.27498, 81.77388, 233.04988, 92.88916, 192.8692, 127.83172, 186.05368, 167.09421, 209.82985, 98.7869]
2025-09-13 03:10:32,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [95.0, 50.0, 109.0, 56.0, 94.0, 67.0, 90.0, 81.0, 97.0, 58.0]
2025-09-13 03:10:32,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (157.85) for latency ExtremeSparseL4U32
2025-09-13 03:10:32,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 18 hours, 32 minutes, 12 seconds)
2025-09-13 03:21:11,442 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:21:11,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:21:35,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 110.52661 ± 66.731
2025-09-13 03:21:35,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [227.70844, 54.73774, 114.86347, 84.58977, 191.71178, 78.67598, 60.710068, 25.319887, 199.02829, 67.92063]
2025-09-13 03:21:35,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [153.0, 64.0, 76.0, 59.0, 125.0, 59.0, 39.0, 29.0, 158.0, 54.0]
2025-09-13 03:21:35,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 18 hours, 11 minutes, 28 seconds)
2025-09-13 03:32:23,590 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:32:23,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:32:48,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 150.86018 ± 97.275
2025-09-13 03:32:48,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [193.942, 62.8982, 182.56375, 324.11014, 282.5013, 44.805813, 54.213894, 159.65384, 27.012096, 176.90082]
2025-09-13 03:32:48,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [103.0, 40.0, 91.0, 158.0, 110.0, 31.0, 37.0, 91.0, 28.0, 139.0]
2025-09-13 03:32:48,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 17 hours, 59 minutes, 13 seconds)
2025-09-13 03:43:23,548 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:43:23,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:43:47,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 163.51469 ± 91.455
2025-09-13 03:43:47,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [130.98676, 254.89539, 104.80668, 360.74814, 121.554184, 93.0747, 77.2485, 187.60333, 60.24541, 243.98395]
2025-09-13 03:43:47,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [69.0, 114.0, 60.0, 143.0, 65.0, 55.0, 47.0, 102.0, 40.0, 102.0]
2025-09-13 03:43:47,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (163.51) for latency ExtremeSparseL4U32
2025-09-13 03:43:47,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 17 hours, 42 minutes, 57 seconds)
2025-09-13 03:54:29,210 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:54:29,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:54:57,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 219.87595 ± 118.131
2025-09-13 03:54:57,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [186.44778, 144.9309, 159.08482, 76.787926, 278.33435, 261.98993, 19.462622, 404.96432, 292.9637, 373.79312]
2025-09-13 03:54:57,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [88.0, 73.0, 79.0, 47.0, 111.0, 129.0, 25.0, 149.0, 122.0, 138.0]
2025-09-13 03:54:57,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (219.88) for latency ExtremeSparseL4U32
2025-09-13 03:54:57,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 17 hours, 27 minutes, 17 seconds)
2025-09-13 04:05:34,962 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:05:34,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:06:14,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 297.84970 ± 44.795
2025-09-13 04:06:14,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [254.06647, 344.18207, 330.45297, 320.45654, 203.0488, 313.6117, 298.39804, 297.27, 258.6815, 358.3289]
2025-09-13 04:06:14,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [112.0, 147.0, 132.0, 135.0, 105.0, 127.0, 132.0, 124.0, 127.0, 173.0]
2025-09-13 04:06:14,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (297.85) for latency ExtremeSparseL4U32
2025-09-13 04:06:14,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 17 hours, 16 minutes, 5 seconds)
2025-09-13 04:16:54,503 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:16:54,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:17:28,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 253.32535 ± 75.217
2025-09-13 04:17:28,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [280.3218, 113.998314, 293.97696, 339.62207, 316.49933, 137.01274, 286.7608, 324.32898, 247.6105, 193.12175]
2025-09-13 04:17:28,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [120.0, 60.0, 131.0, 152.0, 130.0, 74.0, 118.0, 147.0, 121.0, 106.0]
2025-09-13 04:17:28,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 17 hours, 8 minutes, 12 seconds)
2025-09-13 04:28:09,867 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:28:09,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:28:58,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 407.73297 ± 205.406
2025-09-13 04:28:58,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [206.29106, 251.71408, 508.57162, 777.57635, 716.0236, 395.0441, 149.75314, 496.6123, 211.53673, 364.2067]
2025-09-13 04:28:58,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [129.0, 106.0, 202.0, 302.0, 227.0, 150.0, 76.0, 175.0, 90.0, 153.0]
2025-09-13 04:28:58,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (407.73) for latency ExtremeSparseL4U32
2025-09-13 04:28:58,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 17 hours, 2 minutes, 11 seconds)
2025-09-13 04:39:34,538 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:39:34,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:40:06,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 246.42241 ± 134.169
2025-09-13 04:40:06,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [179.80513, 270.98093, 272.36975, 96.37022, 178.49902, 91.175865, 233.61282, 277.7137, 594.8826, 268.814]
2025-09-13 04:40:06,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [81.0, 114.0, 152.0, 57.0, 84.0, 61.0, 103.0, 117.0, 197.0, 115.0]
2025-09-13 04:40:06,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 16 hours, 53 minutes, 55 seconds)
2025-09-13 04:50:47,889 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:50:47,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:51:27,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 325.52362 ± 180.892
2025-09-13 04:51:27,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [194.27925, 558.5566, 580.71674, 236.40102, 176.91545, 598.56934, 299.76468, 362.57047, 159.15013, 88.31264]
2025-09-13 04:51:27,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 193.0, 201.0, 104.0, 95.0, 222.0, 122.0, 140.0, 74.0, 65.0]
2025-09-13 04:51:27,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 16 hours, 45 minutes, 42 seconds)
2025-09-13 05:02:10,804 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:02:10,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:02:48,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 335.90421 ± 274.464
2025-09-13 05:02:48,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [109.571594, 18.918718, 13.849161, 677.25745, 269.12036, 554.0994, 829.7849, 514.8952, 250.02025, 121.52496]
2025-09-13 05:02:48,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [59.0, 27.0, 17.0, 275.0, 121.0, 188.0, 261.0, 167.0, 101.0, 66.0]
2025-09-13 05:02:48,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 16 hours, 35 minutes, 35 seconds)
2025-09-13 05:13:26,613 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:13:26,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:14:02,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 286.72125 ± 202.476
2025-09-13 05:14:02,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [21.499922, 95.404396, 176.34041, 348.97885, 216.62302, 34.800423, 449.9494, 610.8673, 577.4176, 335.33118]
2025-09-13 05:14:02,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 64.0, 115.0, 140.0, 94.0, 33.0, 165.0, 235.0, 220.0, 144.0]
2025-09-13 05:14:02,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 16 hours, 24 minutes, 16 seconds)
2025-09-13 05:24:45,466 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:24:45,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:25:21,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 283.95615 ± 205.148
2025-09-13 05:25:21,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [240.33382, 12.949288, 263.67685, 104.6738, 480.20865, 148.79126, 260.40195, 281.7258, 257.93253, 788.86743]
2025-09-13 05:25:21,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [118.0, 16.0, 118.0, 57.0, 188.0, 73.0, 119.0, 131.0, 126.0, 259.0]
2025-09-13 05:25:21,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 16 hours, 9 minutes, 52 seconds)
2025-09-13 05:35:57,505 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:35:57,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:36:35,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 300.97922 ± 192.858
2025-09-13 05:36:35,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [590.6761, 78.74569, 111.28216, 186.56609, 337.60397, 127.622086, 249.83427, 191.3289, 547.4476, 588.68524]
2025-09-13 05:36:35,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [220.0, 66.0, 73.0, 87.0, 136.0, 65.0, 131.0, 102.0, 222.0, 189.0]
2025-09-13 05:36:35,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 16 hours, 3 seconds)
2025-09-13 05:47:12,459 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:47:12,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:48:08,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 461.04425 ± 200.307
2025-09-13 05:48:08,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [435.64664, 657.2124, 415.05, 295.41812, 669.2348, 185.84283, 142.4411, 542.7446, 481.99478, 784.857]
2025-09-13 05:48:08,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [170.0, 269.0, 158.0, 128.0, 282.0, 84.0, 89.0, 208.0, 184.0, 310.0]
2025-09-13 05:48:08,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (461.04) for latency ExtremeSparseL4U32
2025-09-13 05:48:08,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 15 hours, 52 minutes, 15 seconds)
2025-09-13 05:58:51,686 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:58:51,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:59:29,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 280.33694 ± 237.902
2025-09-13 05:59:29,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [201.58601, 307.5064, 758.72296, 151.22813, 158.11526, 300.2521, 675.3569, 22.732567, 12.944072, 214.92516]
2025-09-13 05:59:29,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [96.0, 129.0, 307.0, 105.0, 77.0, 120.0, 288.0, 28.0, 16.0, 113.0]
2025-09-13 05:59:29,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 15 hours, 40 minutes, 56 seconds)
2025-09-13 06:10:24,034 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:10:24,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:11:00,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 262.65649 ± 135.326
2025-09-13 06:11:00,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [272.86786, 216.8052, 174.1184, 625.5977, 285.89383, 215.56702, 286.87735, 71.51208, 225.88358, 251.44186]
2025-09-13 06:11:00,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [135.0, 102.0, 82.0, 273.0, 129.0, 100.0, 126.0, 43.0, 102.0, 118.0]
2025-09-13 06:11:00,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 15 hours, 34 minutes, 9 seconds)
2025-09-13 06:21:22,691 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:21:22,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:22:26,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 598.99774 ± 262.614
2025-09-13 06:22:26,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [248.56111, 498.83673, 446.6209, 718.96265, 931.662, 792.27704, 598.7772, 818.06915, 848.19354, 88.017586]
2025-09-13 06:22:26,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [114.0, 197.0, 173.0, 236.0, 291.0, 294.0, 224.0, 287.0, 310.0, 51.0]
2025-09-13 06:22:26,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (599.00) for latency ExtremeSparseL4U32
2025-09-13 06:22:26,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 15 hours, 24 minutes, 38 seconds)
2025-09-13 06:32:59,629 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:32:59,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:33:50,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 418.62637 ± 464.435
2025-09-13 06:33:50,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [108.76289, 93.59631, 1656.2599, 158.48807, 381.1994, 712.63513, 437.0332, 104.167366, 19.150337, 514.97125]
2025-09-13 06:33:50,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [59.0, 54.0, 602.0, 77.0, 172.0, 272.0, 186.0, 68.0, 23.0, 219.0]
2025-09-13 06:33:50,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 15 hours, 16 minutes, 4 seconds)
2025-09-13 06:44:31,910 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:44:31,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:45:35,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 536.48474 ± 352.761
2025-09-13 06:45:35,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [473.43155, 495.11267, 188.59042, 250.56018, 218.55031, 1021.7792, 70.45553, 895.0036, 627.1761, 1124.1875]
2025-09-13 06:45:35,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [192.0, 202.0, 101.0, 116.0, 100.0, 397.0, 59.0, 323.0, 253.0, 439.0]
2025-09-13 06:45:35,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 15 hours, 7 minutes, 55 seconds)
2025-09-13 06:56:11,647 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:56:11,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:57:34,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 730.15021 ± 440.917
2025-09-13 06:57:34,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [871.2317, 696.29047, 1247.423, 867.5896, 442.0424, 1467.6171, 24.051338, 116.060844, 1037.6681, 531.52734]
2025-09-13 06:57:34,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [320.0, 270.0, 482.0, 344.0, 177.0, 504.0, 29.0, 91.0, 388.0, 221.0]
2025-09-13 06:57:34,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (730.15) for latency ExtremeSparseL4U32
2025-09-13 06:57:34,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 15 hours, 6 minutes, 6 seconds)
2025-09-13 07:08:09,316 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:08:09,318 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:09:11,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 520.95996 ± 206.354
2025-09-13 07:09:11,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [602.4214, 226.7694, 409.61377, 765.75244, 292.34726, 287.4021, 869.5624, 679.9566, 482.5484, 593.22534]
2025-09-13 07:09:11,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [237.0, 101.0, 186.0, 293.0, 122.0, 121.0, 308.0, 269.0, 195.0, 250.0]
2025-09-13 07:09:11,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 14 hours, 56 minutes)
2025-09-13 07:19:56,356 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:19:56,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:20:56,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 504.61279 ± 335.652
2025-09-13 07:20:56,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [214.57767, 698.9802, 591.45746, 24.351585, 935.3293, 207.19489, 638.61127, 1119.9905, 403.87665, 211.75806]
2025-09-13 07:20:56,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [101.0, 264.0, 230.0, 31.0, 324.0, 101.0, 247.0, 413.0, 167.0, 103.0]
2025-09-13 07:20:56,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 14 hours, 49 minutes, 13 seconds)
2025-09-13 07:31:33,311 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:31:33,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:32:30,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 496.87231 ± 239.131
2025-09-13 07:32:30,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [183.56474, 391.11514, 392.8817, 585.74097, 859.20184, 417.3985, 575.313, 546.35126, 894.134, 123.02218]
2025-09-13 07:32:30,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [88.0, 173.0, 164.0, 208.0, 325.0, 167.0, 209.0, 205.0, 333.0, 64.0]
2025-09-13 07:32:30,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 14 hours, 39 minutes, 56 seconds)
2025-09-13 07:43:04,769 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:43:04,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:43:58,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 428.94791 ± 496.087
2025-09-13 07:43:58,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [296.23898, 1615.5184, 117.13809, 84.84627, 194.98833, 911.5841, 18.618828, 32.584442, 197.6327, 820.32874]
2025-09-13 07:43:58,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [147.0, 593.0, 76.0, 59.0, 101.0, 352.0, 23.0, 33.0, 117.0, 326.0]
2025-09-13 07:43:58,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 14 hours, 23 minutes, 54 seconds)
2025-09-13 07:54:38,679 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:54:38,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:56:18,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 910.45978 ± 766.573
2025-09-13 07:56:18,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [73.09117, 16.361975, 451.3169, 903.6267, 190.72627, 370.62543, 1658.1393, 1683.8953, 1475.685, 2281.13]
2025-09-13 07:56:18,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [46.0, 21.0, 182.0, 362.0, 114.0, 155.0, 564.0, 618.0, 508.0, 796.0]
2025-09-13 07:56:18,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (910.46) for latency ExtremeSparseL4U32
2025-09-13 07:56:18,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 14 hours, 17 minutes, 23 seconds)
2025-09-13 08:07:02,348 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:07:02,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:08:02,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 486.01801 ± 383.658
2025-09-13 08:08:02,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [959.26624, 397.44882, 375.02484, 83.883804, 278.89243, 100.20718, 540.85315, 1024.8392, 14.695469, 1085.0687]
2025-09-13 08:08:02,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [390.0, 219.0, 161.0, 51.0, 125.0, 57.0, 212.0, 379.0, 19.0, 434.0]
2025-09-13 08:08:02,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 14 hours, 7 minutes, 31 seconds)
2025-09-13 08:18:40,759 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:18:40,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:19:35,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 449.29297 ± 217.907
2025-09-13 08:19:35,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [327.70755, 391.6674, 649.89984, 22.178215, 324.07867, 815.59326, 580.3032, 383.75885, 328.82663, 668.91583]
2025-09-13 08:19:35,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [135.0, 160.0, 246.0, 21.0, 140.0, 327.0, 224.0, 156.0, 136.0, 267.0]
2025-09-13 08:19:35,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 13 hours, 52 minutes, 45 seconds)
2025-09-13 08:30:15,947 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:30:15,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:31:05,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 399.14862 ± 392.607
2025-09-13 08:31:05,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [222.15839, 18.151129, 266.5238, 364.86926, 25.199324, 582.2764, 831.358, 103.37271, 243.4207, 1334.1565]
2025-09-13 08:31:05,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [107.0, 21.0, 124.0, 146.0, 27.0, 250.0, 307.0, 82.0, 108.0, 503.0]
2025-09-13 08:31:05,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 13 hours, 40 minutes, 15 seconds)
2025-09-13 08:41:33,974 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:41:33,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:42:23,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 436.83405 ± 412.698
2025-09-13 08:42:23,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [669.86035, 16.204584, 28.300983, 165.68694, 1233.0602, 946.6229, 563.37585, 623.24945, 107.89638, 14.082291]
2025-09-13 08:42:23,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [244.0, 19.0, 27.0, 79.0, 467.0, 299.0, 224.0, 237.0, 59.0, 17.0]
2025-09-13 08:42:23,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 13 hours, 26 minutes, 16 seconds)
2025-09-13 08:52:52,479 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:52:52,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:53:50,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 530.98834 ± 466.991
2025-09-13 08:53:50,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [27.861433, 1177.9117, 1174.5237, 1128.7397, 351.4586, 81.81962, 737.4974, 526.3858, 82.71672, 20.968447]
2025-09-13 08:53:50,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [27.0, 429.0, 370.0, 415.0, 136.0, 68.0, 233.0, 202.0, 56.0, 31.0]
2025-09-13 08:53:50,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 13 hours, 2 minutes, 29 seconds)
2025-09-13 09:04:29,661 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:04:29,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:05:34,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 574.48572 ± 238.822
2025-09-13 09:05:34,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [303.74252, 639.53705, 823.88715, 865.2165, 613.3647, 836.6859, 587.0724, 529.50946, 487.13507, 58.706448]
2025-09-13 09:05:34,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [129.0, 252.0, 305.0, 322.0, 235.0, 305.0, 218.0, 208.0, 198.0, 35.0]
2025-09-13 09:05:34,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 12 hours, 50 minutes, 57 seconds)
2025-09-13 09:16:07,741 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:16:07,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:17:09,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 525.62146 ± 380.292
2025-09-13 09:17:09,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [832.5533, 1007.8131, 144.83536, 186.93391, 560.15576, 189.44278, 1318.7833, 251.27003, 388.51575, 375.911]
2025-09-13 09:17:09,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [323.0, 387.0, 82.0, 90.0, 218.0, 90.0, 462.0, 113.0, 160.0, 154.0]
2025-09-13 09:17:09,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 12 hours, 40 minutes, 2 seconds)
2025-09-13 09:27:45,205 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:27:45,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:29:04,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 684.60309 ± 447.200
2025-09-13 09:29:04,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [309.0188, 1665.7883, 848.82263, 869.08954, 267.07968, 956.64264, 611.6289, 908.9617, 391.71185, 17.286858]
2025-09-13 09:29:04,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [141.0, 606.0, 341.0, 333.0, 120.0, 370.0, 244.0, 357.0, 153.0, 23.0]
2025-09-13 09:29:04,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 12 hours, 33 minutes, 38 seconds)
2025-09-13 09:39:56,801 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:39:56,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:40:54,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 475.76581 ± 533.511
2025-09-13 09:40:54,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [369.4302, 619.9414, 583.6449, 1962.3416, 212.32623, 105.870056, 112.94645, 82.71152, 542.3696, 166.07637]
2025-09-13 09:40:54,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [152.0, 260.0, 218.0, 713.0, 101.0, 94.0, 94.0, 48.0, 197.0, 79.0]
2025-09-13 09:40:54,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 12 hours, 28 minutes, 55 seconds)
2025-09-13 09:51:14,821 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:51:14,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:52:56,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 968.33173 ± 990.958
2025-09-13 09:52:56,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [82.93996, 367.99353, 141.64165, 25.728567, 1374.4111, 2330.277, 604.0945, 91.57315, 1851.9158, 2812.742]
2025-09-13 09:52:56,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [48.0, 148.0, 72.0, 29.0, 456.0, 785.0, 239.0, 53.0, 632.0, 1000.0]
2025-09-13 09:52:56,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (968.33) for latency ExtremeSparseL4U32
2025-09-13 09:52:56,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 12 hours, 24 minutes, 44 seconds)
2025-09-13 10:03:53,281 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:03:53,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:05:01,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 576.96338 ± 232.550
2025-09-13 10:05:01,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [652.59314, 715.69037, 830.74835, 570.3388, 805.2838, 130.22598, 641.424, 145.71017, 669.54663, 608.072]
2025-09-13 10:05:01,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [286.0, 273.0, 290.0, 234.0, 333.0, 66.0, 251.0, 73.0, 255.0, 247.0]
2025-09-13 10:05:01,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 12 hours, 17 minutes, 9 seconds)
2025-09-13 10:15:08,224 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:15:08,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:16:19,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 610.22290 ± 515.749
2025-09-13 10:16:19,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [668.8438, 1859.3937, 203.58496, 553.67224, 233.59962, 61.73203, 1038.8331, 364.84653, 902.1368, 215.58647]
2025-09-13 10:16:19,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [256.0, 683.0, 94.0, 243.0, 112.0, 50.0, 373.0, 151.0, 314.0, 100.0]
2025-09-13 10:16:19,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 12 hours, 1 minute, 44 seconds)
2025-09-13 10:27:01,527 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:27:01,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:27:58,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 468.82095 ± 591.549
2025-09-13 10:27:58,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [557.9398, 193.85744, 107.58901, 22.917654, 98.1211, 124.08047, 2036.3749, 813.9289, 25.389084, 708.01105]
2025-09-13 10:27:58,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [219.0, 103.0, 60.0, 25.0, 70.0, 92.0, 755.0, 301.0, 26.0, 283.0]
2025-09-13 10:27:58,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 11 hours, 46 minutes, 47 seconds)
2025-09-13 10:38:39,690 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:38:39,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:40:10,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 827.16248 ± 839.027
2025-09-13 10:40:10,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [492.53427, 2829.119, 28.671772, 1825.1339, 107.44562, 747.46265, 1060.8165, 372.82407, 670.7404, 136.87733]
2025-09-13 10:40:10,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [196.0, 1000.0, 30.0, 655.0, 60.0, 283.0, 346.0, 147.0, 257.0, 83.0]
2025-09-13 10:40:10,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 11 hours, 39 minutes, 25 seconds)
2025-09-13 10:51:19,960 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:51:19,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:52:09,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 404.50009 ± 602.034
2025-09-13 10:52:09,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [292.15103, 357.628, 1026.9082, 30.398642, 32.101547, 100.96208, 18.711634, 171.49889, 1987.5161, 27.124762]
2025-09-13 10:52:09,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [133.0, 155.0, 376.0, 29.0, 43.0, 77.0, 29.0, 83.0, 695.0, 32.0]
2025-09-13 10:52:09,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 11 hours, 26 minutes, 54 seconds)
2025-09-13 11:02:02,411 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:02:02,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:03:01,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 497.53491 ± 415.677
2025-09-13 11:03:01,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [26.019138, 989.5885, 1328.9272, 343.36768, 708.9905, 205.42168, 136.72568, 83.495705, 347.98712, 804.82587]
2025-09-13 11:03:01,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 364.0, 459.0, 148.0, 280.0, 93.0, 90.0, 48.0, 145.0, 309.0]
2025-09-13 11:03:01,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 11 hours, 1 minute, 8 seconds)
2025-09-13 11:13:34,599 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:13:34,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:14:38,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 563.00250 ± 578.115
2025-09-13 11:14:38,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [950.319, 87.01476, 23.237091, 1979.21, 25.84007, 23.801508, 728.3249, 505.81818, 507.5729, 798.88654]
2025-09-13 11:14:38,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [356.0, 54.0, 26.0, 722.0, 29.0, 29.0, 263.0, 199.0, 218.0, 284.0]
2025-09-13 11:14:38,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 10 hours, 53 minutes, 11 seconds)
2025-09-13 11:25:24,592 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:25:24,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:26:38,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 665.40558 ± 562.575
2025-09-13 11:26:38,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1171.9359, 1234.5068, 671.98047, 101.747765, 137.44785, 1468.4254, 1389.1204, 128.87468, 333.93802, 16.078794]
2025-09-13 11:26:38,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [415.0, 445.0, 245.0, 55.0, 77.0, 523.0, 508.0, 67.0, 135.0, 24.0]
2025-09-13 11:26:38,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 10 hours, 45 minutes, 28 seconds)
2025-09-13 11:37:12,022 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:37:12,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:38:51,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 921.59973 ± 1035.169
2025-09-13 11:38:51,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [176.12228, 18.842972, 2082.9158, 101.29165, 252.2662, 2690.658, 140.48796, 148.93533, 1099.4867, 2504.9897]
2025-09-13 11:38:51,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [87.0, 28.0, 735.0, 56.0, 107.0, 962.0, 72.0, 99.0, 352.0, 911.0]
2025-09-13 11:38:51,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 10 hours, 33 minutes, 47 seconds)
2025-09-13 11:49:53,317 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:49:53,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:51:27,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 855.08118 ± 814.038
2025-09-13 11:51:27,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [358.53085, 94.240364, 1160.4528, 987.7398, 1527.833, 2828.8894, 555.8097, 19.456314, 905.34827, 112.510635]
2025-09-13 11:51:27,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [152.0, 75.0, 404.0, 374.0, 534.0, 1000.0, 206.0, 26.0, 315.0, 65.0]
2025-09-13 11:51:27,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 10 hours, 28 minutes, 35 seconds)
2025-09-13 12:01:57,766 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:01:57,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:02:59,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 551.07458 ± 821.681
2025-09-13 12:02:59,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [57.968494, 23.936598, 438.49698, 15.415843, 2883.7878, 705.1917, 137.39897, 610.16956, 16.289501, 622.0908]
2025-09-13 12:02:59,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [35.0, 28.0, 194.0, 19.0, 990.0, 258.0, 86.0, 232.0, 17.0, 246.0]
2025-09-13 12:02:59,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 10 hours, 23 minutes, 42 seconds)
2025-09-13 12:13:09,719 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:13:09,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:13:46,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 295.82184 ± 155.841
2025-09-13 12:13:46,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [287.50528, 334.2143, 92.96012, 310.88248, 434.3421, 636.725, 292.0167, 96.956795, 330.8942, 141.72116]
2025-09-13 12:13:46,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [118.0, 143.0, 53.0, 125.0, 190.0, 231.0, 118.0, 56.0, 130.0, 82.0]
2025-09-13 12:13:46,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 10 hours, 3 minutes, 12 seconds)
2025-09-13 12:24:29,301 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:24:29,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:25:10,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 324.68463 ± 304.625
2025-09-13 12:25:10,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [626.22473, 925.70325, 94.36452, 723.42316, 27.421982, 197.49399, 116.39822, 170.91191, 343.2953, 21.609364]
2025-09-13 12:25:10,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [225.0, 342.0, 63.0, 294.0, 30.0, 98.0, 72.0, 91.0, 145.0, 31.0]
2025-09-13 12:25:10,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 9 hours, 45 minutes, 11 seconds)
2025-09-13 12:35:48,588 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:35:48,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:37:28,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 877.35773 ± 737.916
2025-09-13 12:37:28,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [146.2756, 2368.746, 608.2691, 295.83078, 244.53935, 638.83716, 1632.0829, 988.82074, 1709.9325, 140.243]
2025-09-13 12:37:28,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [82.0, 880.0, 248.0, 146.0, 119.0, 258.0, 601.0, 365.0, 634.0, 75.0]
2025-09-13 12:37:28,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 9 hours, 34 minutes, 24 seconds)
2025-09-13 12:48:31,351 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:48:31,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:50:03,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 828.91785 ± 820.308
2025-09-13 12:50:03,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [22.44516, 870.4921, 545.612, 1239.1002, 384.92584, 317.08002, 106.112915, 2633.9856, 230.97694, 1938.4481]
2025-09-13 12:50:03,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 316.0, 202.0, 482.0, 159.0, 137.0, 74.0, 934.0, 100.0, 635.0]
2025-09-13 12:50:03,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 9 hours, 22 minutes, 30 seconds)
2025-09-13 13:00:27,510 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:00:27,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:01:44,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 685.85120 ± 642.348
2025-09-13 13:01:44,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [18.515785, 259.63153, 268.22806, 857.0482, 637.02875, 1122.7822, 455.9297, 14.239888, 2276.1125, 948.99493]
2025-09-13 13:01:44,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 129.0, 115.0, 317.0, 236.0, 403.0, 194.0, 19.0, 843.0, 356.0]
2025-09-13 13:01:44,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 9 hours, 12 minutes, 14 seconds)
2025-09-13 13:11:53,246 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:11:53,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:12:52,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 490.47348 ± 788.657
2025-09-13 13:12:52,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [156.15125, 937.40393, 404.20978, 2725.568, 298.74155, 24.594242, 95.195854, 149.9766, 91.40775, 21.486004]
2025-09-13 13:12:52,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 358.0, 158.0, 1000.0, 134.0, 24.0, 57.0, 93.0, 56.0, 29.0]
2025-09-13 13:12:52,374 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 9 hours, 3 minutes, 38 seconds)
2025-09-13 13:23:27,809 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:23:27,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:24:28,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 528.03101 ± 382.219
2025-09-13 13:24:28,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [108.99257, 887.09705, 254.71431, 202.80673, 938.0912, 528.5426, 701.8656, 1271.8619, 194.2793, 192.05885]
2025-09-13 13:24:28,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [69.0, 326.0, 108.0, 93.0, 346.0, 196.0, 259.0, 474.0, 89.0, 92.0]
2025-09-13 13:24:28,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 8 hours, 53 minutes, 48 seconds)
2025-09-13 13:35:08,673 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:35:08,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:36:19,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 656.69379 ± 254.310
2025-09-13 13:36:19,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [662.79175, 789.3388, 535.2452, 787.56885, 864.2948, 1141.9531, 515.9335, 719.82294, 233.37419, 316.61465]
2025-09-13 13:36:19,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [246.0, 290.0, 189.0, 280.0, 312.0, 403.0, 197.0, 251.0, 109.0, 139.0]
2025-09-13 13:36:19,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 8 hours, 37 minutes, 49 seconds)
2025-09-13 13:47:02,208 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:47:02,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:48:08,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 574.61346 ± 764.358
2025-09-13 13:48:08,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [858.7717, 191.42491, 540.6778, 246.04428, 595.5307, 278.21066, 2741.3958, 16.761635, 263.5532, 13.764087]
2025-09-13 13:48:08,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [309.0, 93.0, 210.0, 105.0, 238.0, 119.0, 983.0, 19.0, 115.0, 16.0]
2025-09-13 13:48:08,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 8 hours, 19 minutes, 31 seconds)
2025-09-13 13:58:41,642 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:58:41,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:59:52,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 621.22693 ± 516.250
2025-09-13 13:59:52,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [15.811748, 180.58362, 46.014236, 785.5646, 918.36194, 127.60609, 1668.6179, 1168.563, 560.3268, 740.8199]
2025-09-13 13:59:52,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 106.0, 53.0, 278.0, 335.0, 82.0, 583.0, 415.0, 208.0, 275.0]
2025-09-13 13:59:52,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 8 hours, 8 minutes, 16 seconds)
2025-09-13 14:10:24,666 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:10:24,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:11:59,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 869.51917 ± 828.728
2025-09-13 14:11:59,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [754.70776, 494.45468, 207.57751, 228.60251, 2960.857, 124.25895, 243.83066, 931.8413, 1250.9404, 1498.1204]
2025-09-13 14:11:59,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [272.0, 196.0, 132.0, 111.0, 1000.0, 66.0, 138.0, 336.0, 445.0, 513.0]
2025-09-13 14:11:59,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 8 hours, 4 minutes, 44 seconds)
2025-09-13 14:22:35,846 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:22:35,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:23:45,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 649.23383 ± 614.028
2025-09-13 14:23:45,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [617.9311, 70.63667, 2145.395, 652.3043, 330.4936, 985.30756, 1127.2864, 19.667582, 82.6046, 460.7114]
2025-09-13 14:23:45,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [231.0, 43.0, 693.0, 239.0, 133.0, 311.0, 389.0, 27.0, 59.0, 192.0]
2025-09-13 14:23:45,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 7 hours, 54 minutes, 9 seconds)
2025-09-13 14:34:20,251 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:34:20,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:35:06,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 411.10736 ± 457.997
2025-09-13 14:35:06,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [21.241764, 717.5787, 27.373566, 395.49515, 124.38813, 18.919094, 1148.7642, 166.03339, 179.88719, 1311.3925]
2025-09-13 14:35:06,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 282.0, 25.0, 151.0, 63.0, 22.0, 405.0, 101.0, 82.0, 435.0]
2025-09-13 14:35:06,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 7 hours, 38 minutes, 34 seconds)
2025-09-13 14:45:55,506 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:45:55,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:47:02,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 607.87292 ± 421.140
2025-09-13 14:47:02,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [948.9908, 966.5424, 718.5002, 1242.6266, 643.27576, 342.4766, 17.390142, 171.44933, 21.35329, 1006.124]
2025-09-13 14:47:02,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [322.0, 349.0, 262.0, 458.0, 231.0, 154.0, 24.0, 99.0, 32.0, 385.0]
2025-09-13 14:47:02,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 7 hours, 27 minutes, 43 seconds)
2025-09-13 14:57:27,930 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:57:27,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:58:13,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 389.45392 ± 588.511
2025-09-13 14:58:13,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [272.06833, 201.77245, 510.69458, 29.955362, 26.25294, 18.430523, 293.4068, 209.92897, 229.6963, 2102.3328]
2025-09-13 14:58:13,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [134.0, 90.0, 189.0, 29.0, 31.0, 27.0, 135.0, 101.0, 98.0, 696.0]
2025-09-13 14:58:13,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 7 hours, 11 minutes, 50 seconds)
2025-09-13 15:08:44,212 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:08:44,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:09:14,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 237.45853 ± 138.009
2025-09-13 15:09:14,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [195.87193, 592.48865, 164.12993, 185.55907, 193.3717, 165.77138, 162.71614, 161.15823, 405.28766, 148.23055]
2025-09-13 15:09:14,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [110.0, 214.0, 76.0, 87.0, 88.0, 79.0, 78.0, 77.0, 156.0, 72.0]
2025-09-13 15:09:14,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 6 hours, 52 minutes, 13 seconds)
2025-09-13 15:19:52,381 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:19:52,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:21:21,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 810.27429 ± 588.782
2025-09-13 15:21:21,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1343.6654, 734.679, 592.1187, 306.85016, 184.9648, 20.499035, 2090.9824, 1155.6523, 613.34296, 1059.9882]
2025-09-13 15:21:21,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [477.0, 265.0, 207.0, 168.0, 99.0, 24.0, 765.0, 399.0, 213.0, 359.0]
2025-09-13 15:21:21,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 6 hours, 43 minutes, 13 seconds)
2025-09-13 15:32:01,901 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:32:01,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:33:31,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 788.55017 ± 872.630
2025-09-13 15:33:31,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1874.4956, 33.015423, 2876.9722, 534.9708, 442.08572, 26.091652, 854.69965, 314.95053, 77.89116, 850.3292]
2025-09-13 15:33:31,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [693.0, 33.0, 1000.0, 201.0, 212.0, 27.0, 302.0, 130.0, 54.0, 322.0]
2025-09-13 15:33:31,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 6 hours, 37 minutes, 10 seconds)
2025-09-13 15:44:27,807 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:44:27,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:45:30,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 584.92639 ± 385.405
2025-09-13 15:45:30,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1057.7946, 768.32245, 876.36334, 1262.9951, 422.96695, 700.58997, 231.92337, 159.91872, 277.4552, 90.934616]
2025-09-13 15:45:30,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [319.0, 257.0, 301.0, 442.0, 182.0, 241.0, 116.0, 99.0, 114.0, 54.0]
2025-09-13 15:45:30,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 6 hours, 25 minutes, 49 seconds)
2025-09-13 15:56:01,498 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:56:01,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:57:39,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 952.54608 ± 762.238
2025-09-13 15:57:39,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [480.8928, 1096.5399, 1322.2317, 2839.6992, 631.9955, 19.528683, 24.004957, 1004.07336, 1125.5514, 980.9432]
2025-09-13 15:57:39,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [170.0, 384.0, 477.0, 1000.0, 228.0, 23.0, 31.0, 326.0, 351.0, 334.0]
2025-09-13 15:57:39,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 6 hours, 20 minutes, 20 seconds)
2025-09-13 16:08:37,903 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:08:37,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:09:22,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 369.22775 ± 318.767
2025-09-13 16:09:22,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [390.93167, 388.87048, 157.77232, 15.0092125, 114.96571, 1016.52515, 287.1054, 286.57535, 127.364456, 907.15765]
2025-09-13 16:09:22,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [156.0, 153.0, 96.0, 23.0, 73.0, 344.0, 126.0, 125.0, 65.0, 344.0]
2025-09-13 16:09:22,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 12 minutes, 50 seconds)
2025-09-13 16:19:24,414 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:19:24,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:20:51,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 760.94086 ± 931.614
2025-09-13 16:20:51,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [2848.2327, 232.5493, 18.591211, 831.8129, 742.0661, 552.0394, 96.8656, 13.117883, 2176.245, 97.88872]
2025-09-13 16:20:51,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 105.0, 22.0, 325.0, 297.0, 258.0, 73.0, 20.0, 781.0, 67.0]
2025-09-13 16:20:52,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 5 hours, 57 minutes, 4 seconds)
2025-09-13 16:31:29,519 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:31:29,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:32:34,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 599.51086 ± 570.000
2025-09-13 16:32:34,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [96.81041, 14.309474, 18.378065, 1313.7253, 1145.6882, 184.13092, 1378.7051, 1221.5469, 15.516251, 606.2979]
2025-09-13 16:32:34,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [55.0, 20.0, 26.0, 457.0, 403.0, 86.0, 477.0, 432.0, 23.0, 223.0]
2025-09-13 16:32:34,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 5 hours, 42 minutes, 27 seconds)
2025-09-13 16:43:01,146 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:43:01,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:44:12,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 624.66553 ± 875.960
2025-09-13 16:44:12,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [205.49413, 437.92105, 24.465942, 26.102146, 31.469479, 2406.3882, 182.259, 120.31392, 526.8877, 2285.3535]
2025-09-13 16:44:12,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [95.0, 183.0, 31.0, 28.0, 33.0, 831.0, 88.0, 85.0, 200.0, 802.0]
2025-09-13 16:44:12,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 5 hours, 28 minutes, 43 seconds)
2025-09-13 16:55:01,709 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:55:01,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:55:50,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 436.07819 ± 468.731
2025-09-13 16:55:50,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [129.01976, 744.3798, 310.44128, 95.90763, 13.603803, 21.789356, 731.73944, 1568.9666, 101.18127, 643.7528]
2025-09-13 16:55:50,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [67.0, 253.0, 129.0, 71.0, 20.0, 22.0, 257.0, 500.0, 56.0, 239.0]
2025-09-13 16:55:50,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 14 minutes, 9 seconds)
2025-09-13 17:06:18,873 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:06:18,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:07:12,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 454.13312 ± 404.346
2025-09-13 17:07:12,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [317.38882, 336.68756, 725.57684, 577.7901, 78.7073, 445.01236, 20.317316, 1476.0295, 81.06877, 482.75262]
2025-09-13 17:07:12,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [133.0, 145.0, 264.0, 224.0, 46.0, 178.0, 22.0, 528.0, 51.0, 187.0]
2025-09-13 17:07:12,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 42 seconds)
2025-09-13 17:17:45,483 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:17:45,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:18:55,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 647.54285 ± 469.708
2025-09-13 17:18:55,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [828.1054, 121.36202, 1569.2003, 279.21088, 242.55748, 477.20105, 192.35968, 488.56226, 1184.3623, 1092.5073]
2025-09-13 17:18:55,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [316.0, 62.0, 573.0, 120.0, 107.0, 184.0, 108.0, 181.0, 409.0, 328.0]
2025-09-13 17:18:55,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 4 hours, 50 minutes, 17 seconds)
2025-09-13 17:29:36,159 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:29:36,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:30:24,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 406.27759 ± 350.448
2025-09-13 17:30:24,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [298.0798, 148.23886, 257.81808, 377.596, 661.5137, 1308.6831, 522.8257, 347.63568, 118.03092, 22.353924]
2025-09-13 17:30:24,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [141.0, 73.0, 129.0, 146.0, 237.0, 468.0, 205.0, 153.0, 61.0, 25.0]
2025-09-13 17:30:24,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 4 hours, 37 minutes, 37 seconds)
2025-09-13 17:40:48,904 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:40:48,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:42:12,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 747.35651 ± 719.022
2025-09-13 17:42:12,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [199.75604, 1271.4485, 763.34973, 176.00734, 931.00946, 2374.7373, 1359.9377, 18.716122, 358.97034, 19.63275]
2025-09-13 17:42:12,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [89.0, 464.0, 331.0, 109.0, 287.0, 817.0, 487.0, 25.0, 191.0, 22.0]
2025-09-13 17:42:12,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 4 hours, 26 minutes, 46 seconds)
2025-09-13 17:53:06,139 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:53:06,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:54:08,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 564.98230 ± 430.547
2025-09-13 17:54:08,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [388.9989, 26.308315, 1403.1345, 692.2282, 850.0851, 17.623384, 645.78357, 897.11975, 21.069408, 707.47186]
2025-09-13 17:54:08,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [150.0, 31.0, 488.0, 256.0, 313.0, 22.0, 238.0, 327.0, 26.0, 266.0]
2025-09-13 17:54:08,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 16 minutes, 33 seconds)
2025-09-13 18:04:36,032 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:04:36,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:06:04,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 867.97687 ± 479.535
2025-09-13 18:06:04,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1400.4354, 369.29886, 1575.8104, 720.7255, 1487.1034, 354.76553, 168.87198, 866.75916, 634.3471, 1101.651]
2025-09-13 18:06:04,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [495.0, 157.0, 568.0, 250.0, 483.0, 138.0, 81.0, 306.0, 230.0, 340.0]
2025-09-13 18:06:04,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 7 minutes, 13 seconds)
2025-09-13 18:17:28,161 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:17:28,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:19:07,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 988.85144 ± 522.433
2025-09-13 18:19:07,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1398.7606, 689.1842, 1569.6727, 236.80208, 264.44266, 853.9252, 880.0632, 1555.5333, 1761.246, 678.8849]
2025-09-13 18:19:07,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [487.0, 248.0, 531.0, 106.0, 124.0, 272.0, 318.0, 479.0, 569.0, 240.0]
2025-09-13 18:19:07,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (988.85) for latency ExtremeSparseL4U32
2025-09-13 18:19:07,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 48 seconds)
2025-09-13 18:29:34,199 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:29:34,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:30:52,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 720.25159 ± 473.480
2025-09-13 18:30:52,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1501.7507, 509.86066, 624.39294, 739.3914, 434.32764, 552.3966, 115.25213, 1447.8602, 1167.1245, 110.15866]
2025-09-13 18:30:52,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [510.0, 194.0, 229.0, 241.0, 169.0, 230.0, 78.0, 508.0, 413.0, 74.0]
2025-09-13 18:30:52,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 3 hours, 49 minutes, 45 seconds)
2025-09-13 18:40:50,691 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:40:50,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:42:13,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 760.20276 ± 716.847
2025-09-13 18:42:13,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [23.722225, 1075.1832, 1601.4368, 278.0373, 721.37085, 233.57092, 28.590769, 407.8203, 2375.5435, 856.7518]
2025-09-13 18:42:13,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 336.0, 555.0, 121.0, 269.0, 101.0, 30.0, 158.0, 839.0, 319.0]
2025-09-13 18:42:13,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 36 minutes, 3 seconds)
2025-09-13 18:52:44,136 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:52:44,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:53:49,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 580.53699 ± 430.927
2025-09-13 18:53:49,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1132.6427, 26.982035, 22.014671, 475.0373, 188.74518, 1028.6904, 267.6153, 543.78815, 991.3191, 1128.5349]
2025-09-13 18:53:49,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [392.0, 31.0, 31.0, 203.0, 87.0, 365.0, 120.0, 226.0, 352.0, 418.0]
2025-09-13 18:53:49,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 22 minutes, 54 seconds)
2025-09-13 19:04:22,877 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:04:22,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:06:30,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1251.35535 ± 579.440
2025-09-13 19:06:30,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [620.96704, 1807.6376, 1757.0068, 715.03217, 704.9074, 2246.5386, 1219.9418, 1359.5635, 1639.7837, 442.1746]
2025-09-13 19:06:30,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [246.0, 620.0, 579.0, 249.0, 275.0, 776.0, 418.0, 445.0, 552.0, 167.0]
2025-09-13 19:06:30,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (1251.36) for latency ExtremeSparseL4U32
2025-09-13 19:06:30,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 13 minutes, 24 seconds)
2025-09-13 19:17:25,908 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:17:25,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:18:32,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 626.01044 ± 581.269
2025-09-13 19:18:32,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [435.34628, 1656.8243, 1485.6859, 1038.5474, 975.4396, 109.81406, 22.880966, 70.626915, 237.6214, 227.31676]
2025-09-13 19:18:32,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [168.0, 535.0, 491.0, 328.0, 343.0, 82.0, 25.0, 51.0, 129.0, 99.0]
2025-09-13 19:18:32,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 2 hours, 58 minutes, 15 seconds)
2025-09-13 19:29:02,599 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:29:02,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:30:04,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 567.50555 ± 526.811
2025-09-13 19:30:04,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [802.23663, 728.48376, 405.61807, 775.7478, 17.681633, 334.00284, 125.011086, 1929.8097, 434.5115, 121.952095]
2025-09-13 19:30:04,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [286.0, 262.0, 181.0, 275.0, 22.0, 146.0, 86.0, 607.0, 170.0, 63.0]
2025-09-13 19:30:05,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 2 hours, 45 minutes, 47 seconds)
2025-09-13 19:40:49,249 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:40:49,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:41:36,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 386.49564 ± 222.019
2025-09-13 19:41:36,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [573.43317, 106.49065, 144.21555, 755.5544, 339.4472, 374.8254, 274.54892, 327.81924, 755.61554, 213.00636]
2025-09-13 19:41:36,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [209.0, 65.0, 107.0, 277.0, 139.0, 164.0, 121.0, 139.0, 305.0, 96.0]
2025-09-13 19:41:36,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 34 minutes, 25 seconds)
2025-09-13 19:51:48,505 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:51:48,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:52:44,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 509.67587 ± 442.075
2025-09-13 19:52:44,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1106.6738, 906.8624, 132.99008, 124.43197, 536.11163, 474.87738, 355.5093, 18.581936, 90.19823, 1350.5222]
2025-09-13 19:52:44,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [393.0, 289.0, 67.0, 73.0, 201.0, 179.0, 149.0, 26.0, 62.0, 473.0]
2025-09-13 19:52:44,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 21 minutes, 23 seconds)
2025-09-13 20:03:24,866 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:03:24,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:05:06,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 939.93732 ± 823.498
2025-09-13 20:05:06,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [294.43677, 2438.3467, 1325.5146, 495.43323, 193.4488, 17.910604, 21.42565, 1046.7769, 1571.8146, 1994.2656]
2025-09-13 20:05:06,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [118.0, 851.0, 477.0, 203.0, 91.0, 24.0, 25.0, 383.0, 571.0, 673.0]
2025-09-13 20:05:06,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 8 minutes, 55 seconds)
2025-09-13 20:15:44,319 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:15:44,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:16:37,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 468.67343 ± 337.185
2025-09-13 20:16:37,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [160.97363, 1170.3511, 449.30362, 148.9554, 453.5099, 198.55746, 333.34106, 1006.0291, 544.60254, 221.11024]
2025-09-13 20:16:37,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [77.0, 406.0, 167.0, 73.0, 178.0, 112.0, 161.0, 343.0, 200.0, 100.0]
2025-09-13 20:16:37,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 1 hour, 56 minutes, 9 seconds)
2025-09-13 20:27:15,466 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:27:15,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:28:49,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 884.67053 ± 618.712
2025-09-13 20:28:49,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1634.1777, 1109.5271, 1692.9568, 229.59549, 1420.7145, 282.74512, 232.92355, 1440.157, 778.9438, 24.964584]
2025-09-13 20:28:49,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [563.0, 387.0, 545.0, 105.0, 492.0, 116.0, 103.0, 508.0, 279.0, 27.0]
2025-09-13 20:28:49,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 45 minutes, 44 seconds)
2025-09-13 20:39:16,505 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:39:16,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:41:03,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1087.31555 ± 791.379
2025-09-13 20:41:03,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [889.0909, 1126.8132, 1170.9703, 611.6072, 1077.7819, 121.88199, 1342.6259, 1327.6382, 3086.1182, 118.62825]
2025-09-13 20:41:03,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [314.0, 356.0, 352.0, 222.0, 369.0, 78.0, 453.0, 424.0, 1000.0, 61.0]
2025-09-13 20:41:04,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 35 minutes, 7 seconds)
2025-09-13 20:51:37,434 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:51:37,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:53:39,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1177.11743 ± 835.782
2025-09-13 20:53:39,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [2073.0952, 104.869644, 1184.3572, 443.12097, 1990.3429, 2512.967, 1409.3096, 1598.9694, 301.74023, 152.40315]
2025-09-13 20:53:39,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [706.0, 56.0, 431.0, 169.0, 689.0, 840.0, 488.0, 551.0, 125.0, 93.0]
2025-09-13 20:53:39,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 25 minutes, 17 seconds)
2025-09-13 21:04:17,245 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:04:17,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:06:41,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 1434.93591 ± 805.187
2025-09-13 21:06:41,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [200.12717, 2677.2588, 1893.7952, 69.8181, 2565.5469, 1529.6399, 1405.4662, 1405.1685, 1391.5776, 1210.9604]
2025-09-13 21:06:41,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [92.0, 878.0, 679.0, 46.0, 898.0, 514.0, 460.0, 454.0, 470.0, 413.0]
2025-09-13 21:06:41,551 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1226 [INFO]: New best (1434.94) for latency ExtremeSparseL4U32
2025-09-13 21:06:41,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 13 minutes, 53 seconds)
2025-09-13 21:17:20,091 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:17:20,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:19:05,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 961.19647 ± 605.357
2025-09-13 21:19:05,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [249.7742, 1525.639, 28.619595, 2216.8074, 449.63474, 788.7229, 982.9386, 1040.589, 1085.1871, 1244.052]
2025-09-13 21:19:05,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [105.0, 561.0, 31.0, 762.0, 201.0, 303.0, 355.0, 383.0, 388.0, 436.0]
2025-09-13 21:19:05,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 2 minutes, 28 seconds)
2025-09-13 21:29:54,043 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:29:54,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:31:07,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 707.62500 ± 848.578
2025-09-13 21:31:07,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [216.79552, 697.3102, 444.3536, 3082.67, 255.99094, 154.82655, 85.861534, 1053.6035, 841.78845, 243.04982]
2025-09-13 21:31:07,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [107.0, 238.0, 171.0, 1000.0, 108.0, 94.0, 51.0, 337.0, 251.0, 105.0]
2025-09-13 21:31:07,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 49 minutes, 50 seconds)
2025-09-13 21:41:24,207 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:41:24,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:42:10,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 395.04260 ± 241.224
2025-09-13 21:42:10,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [376.69208, 610.8844, 492.68604, 840.961, 496.7172, 33.114964, 265.35052, 568.6357, 114.67488, 150.70956]
2025-09-13 21:42:10,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [159.0, 231.0, 193.0, 289.0, 186.0, 32.0, 115.0, 220.0, 60.0, 74.0]
2025-09-13 21:42:10,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 36 minutes, 39 seconds)
2025-09-13 21:53:02,029 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:53:02,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:54:32,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 892.47766 ± 681.688
2025-09-13 21:54:32,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [908.3371, 25.796377, 16.39959, 1846.0483, 664.7646, 857.8863, 1371.7852, 31.32239, 1293.1697, 1909.2667]
2025-09-13 21:54:32,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [292.0, 31.0, 23.0, 580.0, 234.0, 310.0, 478.0, 33.0, 435.0, 638.0]
2025-09-13 21:54:32,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 24 minutes, 21 seconds)
2025-09-13 22:05:17,513 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:05:17,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:06:45,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 840.20831 ± 545.548
2025-09-13 22:06:45,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [1062.3398, 91.94027, 1360.9762, 644.56555, 1347.5304, 1835.6823, 1008.95557, 182.24417, 527.81177, 340.03708]
2025-09-13 22:06:45,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [375.0, 73.0, 471.0, 230.0, 493.0, 600.0, 345.0, 85.0, 193.0, 140.0]
2025-09-13 22:06:45,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 12 minutes)
2025-09-13 22:17:17,620 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:17:17,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:18:22,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 580.93811 ± 474.373
2025-09-13 22:18:22,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1222 [DEBUG]: All rewards: [79.550896, 1164.8014, 1145.7178, 1411.3757, 90.62994, 101.8699, 338.70486, 506.81372, 705.075, 264.84113]
2025-09-13 22:18:22,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [46.0, 400.0, 391.0, 477.0, 69.0, 72.0, 137.0, 193.0, 260.0, 119.0]
2025-09-13 22:18:22,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-hopper):1251 [DEBUG]: Training session finished
