2025-09-12 22:29:49,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc0-hopper/ExtremeSparseL4U32-mbpac_memdelay
2025-09-12 22:29:49,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc0-hopper/ExtremeSparseL4U32-mbpac_memdelay
2025-09-12 22:29:49,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x148a8d984190>}
2025-09-12 22:29:49,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1111 [DEBUG]: using device: cuda
2025-09-12 22:29:49,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1133 [INFO]: Creating new trainer
2025-09-12 22:29:50,217 baseline-mbpac-noiseperc0-hopper:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-09-12 22:29:50,217 baseline-mbpac-noiseperc0-hopper:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-12 22:29:50,224 baseline-mbpac-noiseperc0-hopper:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=3, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-12 22:29:51,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1194 [DEBUG]: Starting training session...
2025-09-12 22:29:51,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 1/100
2025-09-12 22:41:08,201 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:41:08,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:41:17,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 45.07393 ± 24.899
2025-09-12 22:41:17,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [13.310298, 69.01023, 59.790592, 62.891045, 72.07855, 56.787548, 69.87303, 13.513106, 17.458221, 16.026718]
2025-09-12 22:41:17,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 40.0, 35.0, 37.0, 47.0, 33.0, 41.0, 16.0, 18.0, 17.0]
2025-09-12 22:41:17,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (45.07) for latency ExtremeSparseL4U32
2025-09-12 22:41:17,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 18 hours, 50 minutes, 53 seconds)
2025-09-12 22:52:19,917 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:52:19,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 22:52:42,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 108.56069 ± 86.029
2025-09-12 22:52:42,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [287.6479, 40.684067, 44.01093, 93.38254, 205.43298, 30.287422, 44.523075, 37.348442, 103.063866, 199.22565]
2025-09-12 22:52:42,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [153.0, 44.0, 45.0, 74.0, 132.0, 28.0, 37.0, 32.0, 72.0, 123.0]
2025-09-12 22:52:42,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (108.56) for latency ExtremeSparseL4U32
2025-09-12 22:52:42,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 18 hours, 38 minutes, 59 seconds)
2025-09-12 23:03:24,964 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:03:24,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:03:49,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 103.44704 ± 76.631
2025-09-12 23:03:49,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [49.474674, 53.332417, 68.95172, 82.10289, 7.991787, 113.08924, 295.83075, 77.14789, 115.22569, 171.32327]
2025-09-12 23:03:49,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [50.0, 43.0, 61.0, 57.0, 17.0, 135.0, 171.0, 65.0, 95.0, 132.0]
2025-09-12 23:03:49,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 18 hours, 18 minutes, 6 seconds)
2025-09-12 23:14:45,095 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:14:45,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:15:09,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 125.68855 ± 31.255
2025-09-12 23:15:09,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [112.83528, 163.28236, 144.51852, 112.18897, 142.94472, 57.53796, 123.71698, 168.05496, 134.89015, 96.91555]
2025-09-12 23:15:09,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [74.0, 118.0, 84.0, 77.0, 84.0, 39.0, 81.0, 92.0, 98.0, 64.0]
2025-09-12 23:15:09,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (125.69) for latency ExtremeSparseL4U32
2025-09-12 23:15:09,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 18 hours, 7 minutes, 9 seconds)
2025-09-12 23:26:03,733 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:26:03,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:26:36,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 198.16441 ± 109.080
2025-09-12 23:26:36,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [44.59933, 228.39604, 267.2621, 279.96417, 71.51481, 348.69086, 95.074455, 346.4928, 209.0771, 90.572334]
2025-09-12 23:26:36,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [47.0, 117.0, 134.0, 146.0, 46.0, 174.0, 54.0, 162.0, 115.0, 54.0]
2025-09-12 23:26:36,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (198.16) for latency ExtremeSparseL4U32
2025-09-12 23:26:36,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 17 hours, 58 minutes)
2025-09-12 23:37:30,965 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:37:30,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:38:07,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 248.68127 ± 65.075
2025-09-12 23:38:07,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [333.79434, 283.08273, 335.69327, 169.52495, 300.06744, 257.8346, 235.14029, 258.4787, 148.81462, 164.38174]
2025-09-12 23:38:07,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [148.0, 146.0, 160.0, 89.0, 125.0, 115.0, 108.0, 137.0, 96.0, 102.0]
2025-09-12 23:38:07,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (248.68) for latency ExtremeSparseL4U32
2025-09-12 23:38:07,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 17 hours, 48 minutes, 30 seconds)
2025-09-12 23:48:55,358 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 23:48:55,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-12 23:49:25,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 197.77982 ± 44.889
2025-09-12 23:49:25,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [201.8737, 244.45001, 146.82332, 223.54082, 215.3563, 229.16678, 234.251, 232.99944, 116.30382, 133.0329]
2025-09-12 23:49:25,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [101.0, 127.0, 75.0, 120.0, 108.0, 120.0, 120.0, 113.0, 63.0, 79.0]
2025-09-12 23:49:25,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 17 hours, 35 minutes, 10 seconds)
2025-09-13 00:00:29,520 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:00:29,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:01:06,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 188.03725 ± 67.628
2025-09-13 00:01:06,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [164.00812, 147.22906, 223.5176, 109.01816, 164.92387, 257.81915, 351.7375, 146.15903, 168.65308, 147.30688]
2025-09-13 00:01:06,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [104.0, 98.0, 169.0, 80.0, 120.0, 192.0, 194.0, 92.0, 106.0, 94.0]
2025-09-13 00:01:06,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 17 hours, 34 minutes, 4 seconds)
2025-09-13 00:11:48,250 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:11:48,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:12:34,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 358.01624 ± 146.099
2025-09-13 00:12:34,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [227.82603, 149.35083, 599.39087, 570.6442, 330.65518, 150.19725, 374.34283, 414.7653, 359.59134, 403.3985]
2025-09-13 00:12:34,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [108.0, 80.0, 247.0, 210.0, 146.0, 75.0, 178.0, 157.0, 149.0, 176.0]
2025-09-13 00:12:34,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (358.02) for latency ExtremeSparseL4U32
2025-09-13 00:12:34,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 17 hours, 24 minutes, 45 seconds)
2025-09-13 00:23:40,886 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:23:40,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:24:06,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 167.06721 ± 65.475
2025-09-13 00:24:06,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [127.67506, 97.75996, 176.8462, 107.80138, 159.01108, 153.46635, 174.6995, 175.5319, 346.71747, 151.16339]
2025-09-13 00:24:06,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [67.0, 54.0, 87.0, 58.0, 86.0, 80.0, 85.0, 85.0, 155.0, 81.0]
2025-09-13 00:24:06,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 17 hours, 15 minutes, 3 seconds)
2025-09-13 00:34:52,648 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:34:52,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:35:33,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 293.54865 ± 195.134
2025-09-13 00:35:33,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [215.60541, 724.6539, 247.94197, 485.70413, 98.55536, 256.6115, 262.99255, 98.272064, 460.337, 84.81239]
2025-09-13 00:35:33,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [128.0, 278.0, 114.0, 212.0, 59.0, 109.0, 138.0, 56.0, 195.0, 50.0]
2025-09-13 00:35:33,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 17 hours, 2 minutes, 22 seconds)
2025-09-13 00:46:37,923 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:46:37,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:47:43,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 552.31976 ± 347.482
2025-09-13 00:47:43,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [242.47772, 394.73337, 732.69977, 243.71768, 333.77707, 250.20497, 428.24542, 603.8912, 928.6613, 1364.7894]
2025-09-13 00:47:43,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [108.0, 155.0, 287.0, 106.0, 148.0, 122.0, 179.0, 233.0, 337.0, 503.0]
2025-09-13 00:47:43,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (552.32) for latency ExtremeSparseL4U32
2025-09-13 00:47:43,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 17 hours, 6 minutes, 5 seconds)
2025-09-13 00:58:30,881 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 00:58:30,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 00:59:16,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 334.79498 ± 168.455
2025-09-13 00:59:16,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [429.28333, 281.8714, 117.94175, 276.75192, 347.01065, 701.4926, 450.76874, 193.63469, 426.92365, 122.27112]
2025-09-13 00:59:16,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [180.0, 153.0, 62.0, 155.0, 159.0, 288.0, 190.0, 89.0, 182.0, 65.0]
2025-09-13 00:59:16,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 16 hours, 51 minutes, 57 seconds)
2025-09-13 01:10:23,330 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:10:23,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:11:16,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 450.95468 ± 312.986
2025-09-13 01:11:16,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [205.46915, 614.10065, 647.2479, 735.77374, 147.2435, 139.83394, 217.01518, 1149.4825, 254.3048, 399.07535]
2025-09-13 01:11:16,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 221.0, 218.0, 248.0, 77.0, 84.0, 115.0, 402.0, 110.0, 172.0]
2025-09-13 01:11:16,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 16 hours, 49 minutes, 42 seconds)
2025-09-13 01:22:01,474 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:22:01,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:23:02,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 514.45984 ± 309.464
2025-09-13 01:23:02,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [524.4286, 988.519, 293.12048, 955.82007, 74.97084, 532.8795, 679.6398, 246.539, 724.5562, 124.125725]
2025-09-13 01:23:02,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [207.0, 331.0, 119.0, 417.0, 63.0, 215.0, 238.0, 134.0, 242.0, 65.0]
2025-09-13 01:23:02,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 16 hours, 41 minutes, 50 seconds)
2025-09-13 01:34:09,862 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:34:09,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:35:34,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 720.83685 ± 473.948
2025-09-13 01:35:34,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [491.88303, 503.57315, 374.86453, 1645.6505, 209.54286, 1029.082, 1115.5225, 375.1807, 1251.0292, 212.03983]
2025-09-13 01:35:34,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [213.0, 227.0, 149.0, 610.0, 94.0, 387.0, 425.0, 186.0, 459.0, 96.0]
2025-09-13 01:35:34,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (720.84) for latency ExtremeSparseL4U32
2025-09-13 01:35:34,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 16 hours, 48 minutes, 18 seconds)
2025-09-13 01:46:32,615 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:46:32,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:47:34,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 478.82935 ± 439.672
2025-09-13 01:47:34,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [117.987114, 130.15407, 559.67804, 1298.6328, 488.42636, 445.86624, 89.80436, 83.20929, 285.56046, 1288.9746]
2025-09-13 01:47:34,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [65.0, 81.0, 243.0, 445.0, 181.0, 199.0, 70.0, 65.0, 119.0, 546.0]
2025-09-13 01:47:34,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 16 hours, 33 minutes, 30 seconds)
2025-09-13 01:58:17,847 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 01:58:17,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 01:59:20,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 555.33990 ± 418.678
2025-09-13 01:59:20,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [411.21362, 68.33104, 1069.3016, 1418.4436, 906.6775, 284.93887, 646.44904, 255.50809, 241.29785, 251.23798]
2025-09-13 01:59:20,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [160.0, 60.0, 348.0, 457.0, 303.0, 118.0, 238.0, 108.0, 131.0, 126.0]
2025-09-13 01:59:20,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 16 hours, 25 minutes, 10 seconds)
2025-09-13 02:10:30,323 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:10:30,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:11:49,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 702.22351 ± 404.623
2025-09-13 02:11:49,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [791.6907, 822.9593, 110.44491, 736.9818, 772.3353, 937.43713, 1619.0767, 634.8032, 423.58987, 172.91661]
2025-09-13 02:11:49,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [281.0, 288.0, 75.0, 276.0, 266.0, 336.0, 513.0, 236.0, 187.0, 100.0]
2025-09-13 02:11:49,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 16 hours, 20 minutes, 55 seconds)
2025-09-13 02:22:38,029 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:22:38,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:24:12,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 905.51038 ± 441.245
2025-09-13 02:24:12,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1396.706, 679.37036, 702.41437, 709.4068, 968.5526, 1635.8528, 199.8648, 1508.256, 523.69226, 730.987]
2025-09-13 02:24:12,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [452.0, 244.0, 267.0, 237.0, 310.0, 519.0, 90.0, 541.0, 193.0, 228.0]
2025-09-13 02:24:12,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (905.51) for latency ExtremeSparseL4U32
2025-09-13 02:24:12,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 16 hours, 18 minutes, 41 seconds)
2025-09-13 02:35:25,425 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:35:25,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:36:38,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 694.38538 ± 288.339
2025-09-13 02:36:38,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [953.3426, 841.9696, 1117.8649, 357.4639, 712.93695, 798.1824, 568.7619, 389.31573, 206.61705, 997.39886]
2025-09-13 02:36:38,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [337.0, 279.0, 367.0, 141.0, 261.0, 255.0, 205.0, 147.0, 108.0, 324.0]
2025-09-13 02:36:38,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 16 hours, 4 minutes, 43 seconds)
2025-09-13 02:47:18,147 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:47:18,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:49:06,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1020.00409 ± 715.885
2025-09-13 02:49:06,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2212.5876, 1182.7246, 1019.3309, 2348.4104, 1099.9027, 898.7599, 630.975, 434.35327, 128.89162, 244.10474]
2025-09-13 02:49:06,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [764.0, 421.0, 343.0, 835.0, 380.0, 319.0, 223.0, 165.0, 66.0, 104.0]
2025-09-13 02:49:06,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (1020.00) for latency ExtremeSparseL4U32
2025-09-13 02:49:06,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 15 hours, 59 minutes, 49 seconds)
2025-09-13 03:00:41,553 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:00:41,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:02:06,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 716.03302 ± 559.315
2025-09-13 03:02:06,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1750.7161, 547.79724, 938.4519, 1685.0701, 645.3742, 118.56235, 244.01993, 445.76166, 686.85126, 97.72555]
2025-09-13 03:02:06,229 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [644.0, 203.0, 333.0, 620.0, 242.0, 74.0, 128.0, 178.0, 267.0, 73.0]
2025-09-13 03:02:06,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 16 hours, 6 minutes, 29 seconds)
2025-09-13 03:12:19,684 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:12:19,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:13:50,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 795.29413 ± 663.965
2025-09-13 03:13:50,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [241.49078, 1390.345, 2380.3577, 904.88495, 396.64563, 1074.4314, 890.9237, 210.8943, 230.84306, 232.1251]
2025-09-13 03:13:50,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [117.0, 542.0, 834.0, 327.0, 152.0, 417.0, 331.0, 92.0, 119.0, 100.0]
2025-09-13 03:13:50,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 15 hours, 42 minutes, 44 seconds)
2025-09-13 03:24:58,071 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:24:58,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:26:52,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1020.15234 ± 692.934
2025-09-13 03:26:52,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [944.25116, 243.08426, 1533.2896, 2788.8354, 972.6639, 656.05273, 1081.4536, 233.20229, 869.6122, 879.07776]
2025-09-13 03:26:52,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [350.0, 110.0, 546.0, 1000.0, 335.0, 252.0, 402.0, 135.0, 333.0, 335.0]
2025-09-13 03:26:52,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (1020.15) for latency ExtremeSparseL4U32
2025-09-13 03:26:52,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 15 hours, 40 minutes)
2025-09-13 03:37:45,099 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:37:45,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:38:52,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 644.24323 ± 257.583
2025-09-13 03:38:52,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [445.85623, 508.0604, 656.1615, 710.7529, 469.99347, 885.7527, 177.84435, 552.19684, 1077.4071, 958.40656]
2025-09-13 03:38:52,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [166.0, 194.0, 237.0, 260.0, 171.0, 296.0, 83.0, 191.0, 352.0, 304.0]
2025-09-13 03:38:52,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 15 hours, 21 minutes, 7 seconds)
2025-09-13 03:50:26,461 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:50:26,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:52:25,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1009.52771 ± 728.009
2025-09-13 03:52:25,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1780.2604, 921.5484, 656.0131, 1474.2515, 2477.392, 1440.9513, 688.38696, 174.04713, 109.41486, 373.01215]
2025-09-13 03:52:25,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [667.0, 341.0, 285.0, 556.0, 982.0, 525.0, 283.0, 95.0, 83.0, 173.0]
2025-09-13 03:52:25,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 15 hours, 24 minutes, 30 seconds)
2025-09-13 04:03:24,558 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:03:24,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:04:51,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 824.95526 ± 789.592
2025-09-13 04:04:51,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [542.45807, 81.7301, 259.8589, 2300.8186, 88.87232, 633.1998, 703.7447, 2109.2166, 110.98572, 1418.6685]
2025-09-13 04:04:51,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [197.0, 69.0, 104.0, 772.0, 73.0, 244.0, 236.0, 703.0, 78.0, 438.0]
2025-09-13 04:04:51,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 15 hours, 3 minutes, 46 seconds)
2025-09-13 04:15:51,877 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:15:51,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:17:06,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 625.82629 ± 607.990
2025-09-13 04:17:06,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [214.46527, 138.84456, 937.9808, 641.46173, 427.44028, 774.1852, 753.3669, 105.18806, 2212.0156, 53.314537]
2025-09-13 04:17:06,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [97.0, 81.0, 345.0, 240.0, 166.0, 297.0, 287.0, 57.0, 776.0, 55.0]
2025-09-13 04:17:06,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 14 hours, 58 minutes, 15 seconds)
2025-09-13 04:27:35,202 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:27:35,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:28:36,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 520.63055 ± 412.836
2025-09-13 04:28:36,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [191.52354, 182.4505, 185.5808, 183.96375, 1074.1321, 1223.5166, 137.27335, 601.60767, 1050.7734, 375.48386]
2025-09-13 04:28:36,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [90.0, 85.0, 89.0, 86.0, 373.0, 477.0, 90.0, 231.0, 360.0, 152.0]
2025-09-13 04:28:36,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 14 hours, 24 minutes, 20 seconds)
2025-09-13 04:39:23,018 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:39:23,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:41:17,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1024.85828 ± 626.769
2025-09-13 04:41:17,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [153.87181, 1671.4756, 623.2834, 2319.2566, 1475.7539, 1142.9679, 690.9258, 1078.67, 796.2162, 296.1623]
2025-09-13 04:41:17,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [74.0, 586.0, 226.0, 811.0, 521.0, 425.0, 278.0, 403.0, 295.0, 129.0]
2025-09-13 04:41:17,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (1024.86) for latency ExtremeSparseL4U32
2025-09-13 04:41:17,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 14 hours, 21 minutes, 21 seconds)
2025-09-13 04:52:59,018 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:52:59,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:55:27,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1344.03296 ± 793.019
2025-09-13 04:55:27,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [430.92227, 1715.8151, 449.88617, 1115.4957, 1501.4229, 1845.0768, 2839.902, 2297.544, 631.2611, 613.00354]
2025-09-13 04:55:27,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [184.0, 620.0, 200.0, 401.0, 528.0, 636.0, 1000.0, 803.0, 240.0, 240.0]
2025-09-13 04:55:27,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (1344.03) for latency ExtremeSparseL4U32
2025-09-13 04:55:27,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 14 hours, 17 minutes, 13 seconds)
2025-09-13 05:05:41,740 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:05:41,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:07:35,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1121.77686 ± 648.575
2025-09-13 05:07:35,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [623.26556, 510.3696, 1825.6292, 1420.3219, 2241.0352, 1280.0256, 1660.1935, 1013.26276, 56.919544, 586.7451]
2025-09-13 05:07:35,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [231.0, 189.0, 590.0, 470.0, 741.0, 419.0, 533.0, 350.0, 41.0, 212.0]
2025-09-13 05:07:35,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 14 hours, 37 seconds)
2025-09-13 05:18:35,499 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:18:35,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:20:22,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 977.61902 ± 991.801
2025-09-13 05:20:22,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [414.2971, 2918.4863, 172.37512, 89.22746, 910.10547, 2863.6733, 262.72675, 609.46313, 671.9736, 863.86224]
2025-09-13 05:20:22,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [159.0, 1000.0, 84.0, 73.0, 339.0, 946.0, 114.0, 263.0, 265.0, 314.0]
2025-09-13 05:20:22,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 13 hours, 55 minutes, 4 seconds)
2025-09-13 05:31:39,142 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:31:39,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:33:45,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1205.96069 ± 813.154
2025-09-13 05:33:45,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1792.2963, 1173.7571, 1279.2863, 178.95688, 942.92126, 2986.9414, 1773.6709, 415.6676, 202.1447, 1313.9637]
2025-09-13 05:33:45,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [602.0, 426.0, 462.0, 83.0, 339.0, 1000.0, 579.0, 163.0, 111.0, 456.0]
2025-09-13 05:33:45,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 14 hours, 6 minutes, 57 seconds)
2025-09-13 05:45:08,257 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:45:11,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:47:01,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1032.88416 ± 940.301
2025-09-13 05:47:01,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2029.5402, 510.73776, 86.43451, 162.25244, 3009.3464, 1265.709, 1290.9562, 239.41122, 1645.1205, 89.334175]
2025-09-13 05:47:01,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [712.0, 199.0, 56.0, 91.0, 1000.0, 453.0, 397.0, 107.0, 585.0, 73.0]
2025-09-13 05:47:01,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 14 hours, 1 minute, 18 seconds)
2025-09-13 05:57:17,695 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:57:17,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:58:42,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 732.87122 ± 818.051
2025-09-13 05:58:42,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1183.5043, 588.971, 84.33542, 661.1522, 3005.403, 168.52971, 207.25266, 445.62405, 286.0382, 697.90204]
2025-09-13 05:58:42,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [424.0, 220.0, 63.0, 262.0, 1000.0, 81.0, 98.0, 196.0, 144.0, 266.0]
2025-09-13 05:58:42,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 13 hours, 16 minutes, 47 seconds)
2025-09-13 06:09:37,958 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:09:37,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:11:35,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1128.69739 ± 776.502
2025-09-13 06:11:35,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [415.41458, 1993.5188, 1218.772, 815.57074, 1788.7114, 93.91777, 2563.1482, 1042.7006, 90.96005, 1264.2601]
2025-09-13 06:11:35,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [169.0, 645.0, 408.0, 273.0, 618.0, 68.0, 855.0, 381.0, 51.0, 443.0]
2025-09-13 06:11:35,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 13 hours, 13 minutes, 28 seconds)
2025-09-13 06:23:11,766 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:23:11,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:25:50,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1508.71301 ± 1033.137
2025-09-13 06:25:50,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [725.57367, 3023.736, 2551.701, 1268.1813, 2983.0571, 2114.43, 698.76984, 205.595, 1304.7191, 211.36658]
2025-09-13 06:25:50,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [251.0, 966.0, 891.0, 448.0, 1000.0, 748.0, 255.0, 105.0, 487.0, 118.0]
2025-09-13 06:25:50,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (1508.71) for latency ExtremeSparseL4U32
2025-09-13 06:25:50,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 13 hours, 18 minutes, 44 seconds)
2025-09-13 06:36:07,064 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:36:07,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:37:05,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 493.95477 ± 458.610
2025-09-13 06:37:05,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [102.26572, 175.7474, 1015.67194, 144.31384, 40.78549, 627.62585, 766.6419, 1506.7821, 140.16039, 419.55347]
2025-09-13 06:37:05,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [73.0, 81.0, 369.0, 90.0, 33.0, 228.0, 289.0, 504.0, 71.0, 172.0]
2025-09-13 06:37:05,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 12 hours, 39 minutes, 58 seconds)
2025-09-13 06:48:46,064 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:48:46,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:50:04,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 763.46875 ± 574.735
2025-09-13 06:50:04,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [790.7452, 100.695625, 1560.6124, 355.07175, 824.20685, 1520.6283, 302.31213, 390.50964, 1635.7764, 154.1287]
2025-09-13 06:50:04,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [263.0, 72.0, 513.0, 164.0, 273.0, 459.0, 126.0, 163.0, 522.0, 75.0]
2025-09-13 06:50:05,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 12 hours, 24 minutes, 9 seconds)
2025-09-13 07:00:59,939 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:00:59,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:02:43,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1009.77911 ± 575.029
2025-09-13 07:02:43,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [241.18594, 1876.3988, 1009.1269, 1829.9504, 625.9308, 1421.16, 301.6762, 948.59357, 1398.8314, 444.93716]
2025-09-13 07:02:43,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [115.0, 599.0, 349.0, 552.0, 226.0, 471.0, 120.0, 318.0, 470.0, 168.0]
2025-09-13 07:02:43,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 12 hours, 22 minutes, 38 seconds)
2025-09-13 07:13:44,638 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:13:44,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:15:29,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1007.53241 ± 858.418
2025-09-13 07:15:29,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1316.3545, 1579.0048, 2998.9062, 183.93782, 1615.7323, 609.23035, 103.42227, 817.0372, 775.14636, 76.55319]
2025-09-13 07:15:29,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [444.0, 519.0, 937.0, 83.0, 516.0, 259.0, 74.0, 304.0, 291.0, 65.0]
2025-09-13 07:15:29,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 12 hours, 8 minutes, 26 seconds)
2025-09-13 07:26:20,287 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:26:20,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:28:16,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1082.17358 ± 720.157
2025-09-13 07:28:16,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1658.6644, 772.9672, 1551.7253, 400.00424, 125.543915, 1886.4851, 219.15488, 1620.5596, 2125.9604, 460.66916]
2025-09-13 07:28:16,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [565.0, 295.0, 556.0, 155.0, 84.0, 612.0, 126.0, 544.0, 751.0, 174.0]
2025-09-13 07:28:16,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 11 hours, 39 minutes, 14 seconds)
2025-09-13 07:38:55,075 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:38:55,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:40:43,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1066.63611 ± 1314.349
2025-09-13 07:40:43,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [436.58115, 40.67702, 3048.871, 3007.9114, 100.505356, 475.99756, 133.66986, 73.54689, 216.5057, 3132.0957]
2025-09-13 07:40:43,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [163.0, 33.0, 1000.0, 971.0, 56.0, 175.0, 85.0, 56.0, 97.0, 1000.0]
2025-09-13 07:40:43,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 11 hours, 39 minutes, 57 seconds)
2025-09-13 07:51:58,103 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:51:58,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:54:36,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1513.30481 ± 1107.063
2025-09-13 07:54:36,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [193.46909, 191.1528, 181.55405, 1344.9214, 1552.98, 2903.7617, 819.0555, 2932.356, 1978.2977, 3035.4993]
2025-09-13 07:54:36,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [88.0, 101.0, 98.0, 485.0, 551.0, 1000.0, 301.0, 1000.0, 680.0, 1000.0]
2025-09-13 07:54:36,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (1513.30) for latency ExtremeSparseL4U32
2025-09-13 07:54:36,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 11 hours, 36 minutes, 55 seconds)
2025-09-13 08:05:08,423 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:05:08,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:06:25,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 722.97766 ± 916.120
2025-09-13 08:06:25,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [60.081856, 31.221176, 103.74723, 122.3972, 3049.9568, 994.0236, 930.62524, 1521.088, 80.82794, 335.80743]
2025-09-13 08:06:25,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [50.0, 28.0, 80.0, 63.0, 1000.0, 339.0, 306.0, 499.0, 68.0, 143.0]
2025-09-13 08:06:25,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 11 hours, 15 minutes, 16 seconds)
2025-09-13 08:17:28,061 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:17:28,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:20:13,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1622.33032 ± 1054.583
2025-09-13 08:20:13,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2658.1333, 1899.7579, 162.18889, 1055.9584, 202.11928, 3068.3909, 1823.8837, 1670.9553, 3120.0469, 561.8705]
2025-09-13 08:20:13,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [906.0, 612.0, 77.0, 388.0, 93.0, 1000.0, 589.0, 553.0, 1000.0, 205.0]
2025-09-13 08:20:13,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (1622.33) for latency ExtremeSparseL4U32
2025-09-13 08:20:13,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 11 hours, 13 minutes, 12 seconds)
2025-09-13 08:31:12,691 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:31:12,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:33:13,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1132.95959 ± 516.634
2025-09-13 08:33:13,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [451.42548, 1475.1216, 1797.6555, 1682.0624, 389.88422, 1339.4152, 1763.7559, 757.90015, 967.90485, 704.47076]
2025-09-13 08:33:13,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [174.0, 540.0, 641.0, 592.0, 153.0, 459.0, 606.0, 272.0, 327.0, 257.0]
2025-09-13 08:33:13,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 11 hours, 2 minutes, 28 seconds)
2025-09-13 08:44:25,778 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:44:25,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:48:06,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2069.73389 ± 876.030
2025-09-13 08:48:06,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1602.5994, 2923.4395, 2807.4976, 2203.7766, 994.3496, 2190.6936, 2919.6038, 154.60904, 2846.7031, 2054.0647]
2025-09-13 08:48:06,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [581.0, 1000.0, 971.0, 755.0, 352.0, 747.0, 1000.0, 73.0, 1000.0, 709.0]
2025-09-13 08:48:06,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (2069.73) for latency ExtremeSparseL4U32
2025-09-13 08:48:06,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 11 hours, 13 minutes, 49 seconds)
2025-09-13 08:59:35,089 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:59:35,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:01:54,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1300.98047 ± 1035.725
2025-09-13 09:01:54,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2896.05, 1744.2594, 182.87863, 443.90793, 1895.7761, 2917.5742, 59.89096, 306.10757, 779.77576, 1783.5836]
2025-09-13 09:01:54,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 605.0, 97.0, 190.0, 639.0, 1000.0, 47.0, 129.0, 304.0, 648.0]
2025-09-13 09:01:55,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 10 hours, 59 minutes, 34 seconds)
2025-09-13 09:12:20,594 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:12:20,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:14:55,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1500.59399 ± 1188.866
2025-09-13 09:14:55,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3060.7148, 163.81134, 876.18787, 377.29166, 2487.887, 301.75436, 691.33484, 3112.7004, 927.82544, 3006.4329]
2025-09-13 09:14:55,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 77.0, 327.0, 150.0, 824.0, 145.0, 263.0, 1000.0, 333.0, 1000.0]
2025-09-13 09:14:55,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 10 hours, 57 minutes, 34 seconds)
2025-09-13 09:25:55,562 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:25:55,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:27:01,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 600.76355 ± 360.311
2025-09-13 09:27:01,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [697.0668, 62.692295, 931.5565, 419.11725, 201.27861, 659.3809, 508.7116, 281.05652, 987.2979, 1259.4768]
2025-09-13 09:27:01,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [250.0, 45.0, 334.0, 164.0, 106.0, 234.0, 193.0, 135.0, 353.0, 428.0]
2025-09-13 09:27:01,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 10 hours, 27 minutes, 56 seconds)
2025-09-13 09:36:52,554 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:36:52,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:39:25,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1651.83789 ± 1108.331
2025-09-13 09:39:25,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [95.61487, 1358.5801, 2210.0542, 2476.1426, 2317.6997, 96.67311, 2823.6292, 1848.649, 3147.5408, 143.79544]
2025-09-13 09:39:25,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [77.0, 428.0, 700.0, 805.0, 723.0, 60.0, 882.0, 582.0, 1000.0, 72.0]
2025-09-13 09:39:25,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 10 hours, 9 minutes, 4 seconds)
2025-09-13 09:49:35,987 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:49:35,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:51:40,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1283.90369 ± 1044.337
2025-09-13 09:51:40,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [985.91974, 762.6121, 3164.8254, 427.5329, 2148.8223, 794.9189, 193.65521, 279.77707, 3026.8154, 1054.1578]
2025-09-13 09:51:40,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [347.0, 271.0, 1000.0, 166.0, 746.0, 273.0, 93.0, 118.0, 1000.0, 349.0]
2025-09-13 09:51:40,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 9 hours, 32 minutes, 5 seconds)
2025-09-13 10:02:09,008 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:02:09,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:03:59,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1108.49390 ± 993.041
2025-09-13 10:03:59,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [454.39197, 113.77512, 2734.9421, 1564.8588, 392.863, 151.17906, 672.23944, 597.9575, 3014.4165, 1388.3168]
2025-09-13 10:03:59,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [188.0, 85.0, 927.0, 515.0, 158.0, 74.0, 223.0, 214.0, 1000.0, 472.0]
2025-09-13 10:03:59,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 9 hours, 6 minutes, 17 seconds)
2025-09-13 10:14:21,038 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:14:21,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:15:17,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 548.74164 ± 468.868
2025-09-13 10:15:17,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [435.1173, 1491.2097, 172.1619, 338.12653, 33.447437, 330.43954, 743.04614, 1345.0524, 328.204, 270.6112]
2025-09-13 10:15:17,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [165.0, 510.0, 80.0, 138.0, 29.0, 134.0, 250.0, 415.0, 138.0, 117.0]
2025-09-13 10:15:17,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 8 hours, 39 minutes, 7 seconds)
2025-09-13 10:25:05,635 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:25:05,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:27:14,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1360.67859 ± 1238.186
2025-09-13 10:27:14,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3106.3228, 1447.5417, 1035.4785, 260.6593, 181.03491, 3092.6055, 135.40192, 108.52454, 3203.1577, 1036.0596]
2025-09-13 10:27:14,450 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [979.0, 490.0, 339.0, 130.0, 88.0, 1000.0, 77.0, 59.0, 1000.0, 343.0]
2025-09-13 10:27:14,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 8 hours, 25 minutes, 51 seconds)
2025-09-13 10:37:56,277 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:37:56,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:39:54,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1153.30591 ± 714.764
2025-09-13 10:39:54,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [998.65027, 1034.1743, 327.12015, 244.21626, 1048.2056, 917.2314, 2976.906, 1111.3162, 1382.7635, 1492.475]
2025-09-13 10:39:54,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [361.0, 371.0, 157.0, 109.0, 376.0, 317.0, 993.0, 370.0, 481.0, 522.0]
2025-09-13 10:39:54,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 8 hours, 15 minutes, 59 seconds)
2025-09-13 10:49:50,140 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:49:50,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:52:29,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1669.02283 ± 1234.328
2025-09-13 10:52:29,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [164.5198, 1388.019, 2513.455, 162.21999, 3138.3064, 2282.9548, 428.25726, 383.71014, 3068.4524, 3160.334]
2025-09-13 10:52:29,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [99.0, 480.0, 816.0, 81.0, 1000.0, 727.0, 197.0, 150.0, 1000.0, 1000.0]
2025-09-13 10:52:29,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 8 hours, 6 minutes, 32 seconds)
2025-09-13 11:02:31,413 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:02:31,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:03:43,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 679.38666 ± 605.199
2025-09-13 11:03:43,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [548.36145, 61.72971, 468.3206, 1328.4923, 1651.6971, 258.28683, 213.11923, 1732.5665, 322.38037, 208.91188]
2025-09-13 11:03:43,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [205.0, 57.0, 188.0, 453.0, 544.0, 119.0, 116.0, 580.0, 142.0, 97.0]
2025-09-13 11:03:43,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 7 hours, 45 minutes, 51 seconds)
2025-09-13 11:13:58,541 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:13:58,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:17:05,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1938.66370 ± 919.879
2025-09-13 11:17:05,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2004.4393, 3091.4836, 2232.1428, 1568.0088, 2975.1199, 1946.601, 1026.5442, 3072.003, 1374.4059, 95.887924]
2025-09-13 11:17:05,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [674.0, 1000.0, 717.0, 499.0, 1000.0, 631.0, 348.0, 1000.0, 463.0, 78.0]
2025-09-13 11:17:05,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 7 hours, 49 minutes, 40 seconds)
2025-09-13 11:27:24,139 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:27:24,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:29:29,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1270.41101 ± 983.701
2025-09-13 11:29:29,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1479.69, 3107.5068, 103.874596, 1804.406, 88.223366, 400.79028, 178.77144, 1952.4734, 2149.5903, 1438.7842]
2025-09-13 11:29:29,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [489.0, 1000.0, 57.0, 601.0, 72.0, 168.0, 84.0, 651.0, 702.0, 473.0]
2025-09-13 11:29:29,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 7 hours, 40 minutes, 39 seconds)
2025-09-13 11:39:59,326 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:39:59,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:41:08,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 660.51697 ± 441.328
2025-09-13 11:41:08,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1321.9509, 411.68353, 1039.9706, 163.93594, 74.393265, 785.6445, 762.30914, 490.36874, 1339.5292, 215.3835]
2025-09-13 11:41:08,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [471.0, 162.0, 355.0, 78.0, 64.0, 296.0, 267.0, 182.0, 476.0, 95.0]
2025-09-13 11:41:08,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 7 hours, 20 minutes, 56 seconds)
2025-09-13 11:50:53,033 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:50:53,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:52:38,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1070.04492 ± 976.324
2025-09-13 11:52:38,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [112.808105, 176.38126, 584.3735, 713.86475, 1550.5872, 57.4287, 3073.8777, 2480.9207, 680.0102, 1270.1978]
2025-09-13 11:52:38,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [64.0, 83.0, 225.0, 254.0, 519.0, 55.0, 1000.0, 820.0, 238.0, 442.0]
2025-09-13 11:52:38,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 7 hours, 1 minute, 2 seconds)
2025-09-13 12:02:33,912 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:02:33,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:05:42,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1993.22595 ± 1257.994
2025-09-13 12:05:42,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [233.32964, 76.822784, 3085.8108, 232.77977, 3138.8813, 1718.6842, 2736.9844, 3164.5134, 2428.669, 3115.784]
2025-09-13 12:05:42,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [104.0, 65.0, 1000.0, 102.0, 1000.0, 548.0, 913.0, 1000.0, 776.0, 1000.0]
2025-09-13 12:05:42,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 1 minute, 32 seconds)
2025-09-13 12:15:54,159 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:15:54,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:17:42,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1162.98022 ± 1264.249
2025-09-13 12:17:42,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2195.1445, 1830.5621, 68.29843, 855.03864, 3247.567, 3168.1812, 51.957455, 97.19896, 47.861427, 67.99463]
2025-09-13 12:17:42,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [703.0, 567.0, 60.0, 281.0, 1000.0, 984.0, 50.0, 70.0, 49.0, 60.0]
2025-09-13 12:17:42,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 6 hours, 40 minutes, 7 seconds)
2025-09-13 12:27:52,173 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:27:52,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:30:39,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1833.58008 ± 1003.665
2025-09-13 12:30:39,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3290.652, 440.22025, 1783.1877, 3228.1475, 1927.1688, 776.6966, 3179.6184, 1201.3146, 1091.1218, 1417.6733]
2025-09-13 12:30:39,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 160.0, 594.0, 1000.0, 598.0, 258.0, 1000.0, 402.0, 354.0, 493.0]
2025-09-13 12:30:39,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 6 hours, 31 minutes, 28 seconds)
2025-09-13 12:41:18,480 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:41:18,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:44:49,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2369.67700 ± 957.635
2025-09-13 12:44:49,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1756.0999, 2093.3457, 1426.8135, 2940.145, 3186.4739, 3342.7603, 3283.5752, 2537.717, 199.15675, 2930.6838]
2025-09-13 12:44:49,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [586.0, 658.0, 445.0, 906.0, 1000.0, 1000.0, 1000.0, 782.0, 111.0, 918.0]
2025-09-13 12:44:49,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (2369.68) for latency ExtremeSparseL4U32
2025-09-13 12:44:49,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 34 minutes, 46 seconds)
2025-09-13 12:54:24,005 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:54:24,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:56:01,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 988.33801 ± 853.916
2025-09-13 12:56:01,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [810.80884, 454.79935, 752.98096, 783.25543, 166.35112, 3244.8113, 703.80664, 1048.141, 243.48608, 1674.9393]
2025-09-13 12:56:01,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [275.0, 168.0, 282.0, 267.0, 106.0, 1000.0, 253.0, 358.0, 117.0, 551.0]
2025-09-13 12:56:01,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 20 minutes, 21 seconds)
2025-09-13 13:05:59,090 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:05:59,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:08:38,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1748.24097 ± 1072.436
2025-09-13 13:08:38,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [784.0069, 150.22401, 905.7807, 2077.8647, 3211.6196, 1409.6134, 1081.6549, 3158.7444, 3295.2927, 1407.6074]
2025-09-13 13:08:38,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [267.0, 73.0, 319.0, 659.0, 1000.0, 445.0, 379.0, 1000.0, 1000.0, 456.0]
2025-09-13 13:08:38,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 5 minutes, 1 second)
2025-09-13 13:19:28,706 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:19:28,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:21:24,249 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1226.62671 ± 1297.717
2025-09-13 13:21:24,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3131.6763, 186.61606, 3209.9468, 3221.8635, 254.82555, 480.1978, 195.08212, 174.3517, 706.72644, 704.98145]
2025-09-13 13:21:24,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 86.0, 1000.0, 1000.0, 105.0, 188.0, 95.0, 82.0, 254.0, 251.0]
2025-09-13 13:21:24,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 5 hours, 56 minutes, 40 seconds)
2025-09-13 13:30:59,443 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:30:59,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:33:05,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1321.22754 ± 893.117
2025-09-13 13:33:05,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1551.9225, 1005.3447, 1318.04, 1321.046, 2619.1594, 309.2628, 801.77954, 1002.59814, 135.19252, 3147.9312]
2025-09-13 13:33:05,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [471.0, 338.0, 442.0, 422.0, 819.0, 146.0, 296.0, 331.0, 89.0, 962.0]
2025-09-13 13:33:05,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 37 minutes, 6 seconds)
2025-09-13 13:43:33,206 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:43:33,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:45:30,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1280.36243 ± 1108.338
2025-09-13 13:45:30,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [113.76255, 1823.6941, 2949.2356, 1010.2448, 79.41973, 2522.526, 2821.447, 108.922935, 278.70615, 1095.6658]
2025-09-13 13:45:30,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [61.0, 548.0, 898.0, 337.0, 46.0, 808.0, 868.0, 76.0, 117.0, 349.0]
2025-09-13 13:45:30,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 15 minutes, 32 seconds)
2025-09-13 13:55:23,523 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:55:23,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:57:22,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1264.88037 ± 958.128
2025-09-13 13:57:22,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [769.88556, 1091.4541, 108.17358, 163.30644, 2783.2712, 1686.536, 2744.5393, 1798.3416, 1410.9946, 92.301414]
2025-09-13 13:57:22,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [263.0, 357.0, 59.0, 93.0, 851.0, 521.0, 871.0, 570.0, 439.0, 55.0]
2025-09-13 13:57:22,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 6 minutes, 42 seconds)
2025-09-13 14:07:57,731 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:07:57,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:10:48,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1857.78772 ± 1235.296
2025-09-13 14:10:48,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [290.05048, 1668.5026, 3137.1438, 158.70177, 3111.521, 1566.1115, 3223.4724, 2017.3491, 178.27806, 3226.7476]
2025-09-13 14:10:48,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [118.0, 536.0, 1000.0, 78.0, 1000.0, 507.0, 1000.0, 636.0, 91.0, 1000.0]
2025-09-13 14:10:48,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 4 hours, 58 minutes, 21 seconds)
2025-09-13 14:20:30,348 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:20:30,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:22:47,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1462.56323 ± 1071.810
2025-09-13 14:22:47,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [911.04346, 1746.8583, 134.41295, 1267.7343, 3249.609, 2092.0542, 235.65245, 836.2202, 3313.1062, 838.9411]
2025-09-13 14:22:47,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [302.0, 566.0, 87.0, 405.0, 1000.0, 650.0, 119.0, 280.0, 1000.0, 277.0]
2025-09-13 14:22:47,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 4 hours, 42 minutes, 21 seconds)
2025-09-13 14:32:57,772 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:32:57,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:35:57,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2039.86255 ± 1076.702
2025-09-13 14:35:57,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3376.9954, 755.15173, 1711.1091, 793.7002, 2091.5432, 3305.1597, 1741.93, 460.8667, 2816.9502, 3345.2183]
2025-09-13 14:35:57,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 251.0, 523.0, 260.0, 634.0, 1000.0, 530.0, 163.0, 831.0, 1000.0]
2025-09-13 14:35:57,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 36 minutes, 39 seconds)
2025-09-13 14:46:10,135 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:46:10,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:49:02,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1863.25757 ± 1256.983
2025-09-13 14:49:02,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3229.9915, 1013.3227, 3171.2139, 3326.1792, 3266.9646, 189.26416, 1049.2899, 2223.4692, 77.57086, 1085.3114]
2025-09-13 14:49:02,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 340.0, 1000.0, 1000.0, 1000.0, 86.0, 353.0, 703.0, 52.0, 371.0]
2025-09-13 14:49:02,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 26 minutes, 51 seconds)
2025-09-13 14:59:15,181 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:59:15,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:02:02,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1816.49512 ± 1288.223
2025-09-13 15:02:02,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1958.8956, 3304.5085, 408.8095, 548.6973, 3260.4243, 250.40984, 1397.6348, 480.08545, 3243.4714, 3312.0137]
2025-09-13 15:02:02,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [596.0, 1000.0, 152.0, 202.0, 1000.0, 105.0, 457.0, 196.0, 1000.0, 1000.0]
2025-09-13 15:02:02,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 18 minutes, 39 seconds)
2025-09-13 15:12:34,592 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:12:34,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:15:02,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1644.44299 ± 1121.591
2025-09-13 15:15:02,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [106.6535, 2250.5159, 1426.0721, 3261.6387, 787.5751, 3402.5767, 761.6505, 243.11023, 2416.1106, 1788.5261]
2025-09-13 15:15:02,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [75.0, 675.0, 444.0, 1000.0, 262.0, 1000.0, 254.0, 124.0, 749.0, 567.0]
2025-09-13 15:15:02,917 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 4 minutes, 7 seconds)
2025-09-13 15:24:53,015 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:24:53,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:27:13,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1523.78345 ± 985.703
2025-09-13 15:27:13,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1777.7852, 48.98537, 763.11395, 419.65414, 1834.2103, 3193.8567, 2860.9434, 2166.1597, 762.2467, 1410.8793]
2025-09-13 15:27:13,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [570.0, 37.0, 279.0, 159.0, 584.0, 948.0, 875.0, 675.0, 268.0, 455.0]
2025-09-13 15:27:13,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 52 minutes)
2025-09-13 15:37:38,296 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:37:38,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:40:20,795 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1823.66565 ± 1092.563
2025-09-13 15:40:20,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2656.3762, 779.00867, 1554.0491, 128.6205, 1903.1757, 3364.1292, 2157.9204, 2095.6858, 275.68475, 3322.0073]
2025-09-13 15:40:20,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [804.0, 267.0, 501.0, 72.0, 596.0, 1000.0, 633.0, 658.0, 115.0, 1000.0]
2025-09-13 15:40:20,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 38 minutes, 53 seconds)
2025-09-13 15:50:32,675 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:50:32,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:53:56,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2350.33081 ± 1232.405
2025-09-13 15:53:56,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [763.4754, 3400.9658, 1735.8439, 3158.691, 322.70557, 3393.2576, 3407.4314, 3451.7615, 3087.8413, 781.33453]
2025-09-13 15:53:56,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [263.0, 1000.0, 528.0, 994.0, 147.0, 1000.0, 1000.0, 1000.0, 905.0, 275.0]
2025-09-13 15:53:56,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 27 minutes, 40 seconds)
2025-09-13 16:03:51,278 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:03:51,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:06:19,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1638.03455 ± 1226.458
2025-09-13 16:06:19,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3050.752, 1277.9532, 1592.2798, 1886.7917, 108.65607, 151.93542, 3418.4988, 1323.0433, 3365.171, 205.26512]
2025-09-13 16:06:19,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [925.0, 429.0, 506.0, 590.0, 70.0, 93.0, 1000.0, 421.0, 1000.0, 93.0]
2025-09-13 16:06:19,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 12 minutes, 50 seconds)
2025-09-13 16:17:10,670 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:17:10,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:20:32,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2231.02197 ± 1117.325
2025-09-13 16:20:32,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2393.6533, 2174.8623, 3318.9546, 75.43571, 3370.754, 3390.0305, 1855.7144, 3387.6565, 811.98456, 1531.173]
2025-09-13 16:20:32,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [709.0, 678.0, 1000.0, 45.0, 1000.0, 1000.0, 588.0, 1000.0, 274.0, 493.0]
2025-09-13 16:20:32,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 3 minutes, 22 seconds)
2025-09-13 16:30:41,432 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:30:41,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:32:03,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 827.54895 ± 789.944
2025-09-13 16:32:03,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [1995.1655, 69.34861, 57.01069, 1486.9822, 866.9631, 214.24086, 2000.6624, 1441.4795, 83.51772, 60.11889]
2025-09-13 16:32:03,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [590.0, 61.0, 55.0, 468.0, 291.0, 116.0, 636.0, 454.0, 69.0, 56.0]
2025-09-13 16:32:03,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 48 minutes, 32 seconds)
2025-09-13 16:42:04,148 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:42:04,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:44:41,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1755.05920 ± 1314.777
2025-09-13 16:44:41,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [519.3756, 1493.55, 1852.1119, 2462.5874, 49.914433, 615.4848, 3470.669, 3436.3928, 3436.9106, 213.59544]
2025-09-13 16:44:41,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [189.0, 462.0, 585.0, 743.0, 50.0, 232.0, 1000.0, 1000.0, 1000.0, 100.0]
2025-09-13 16:44:41,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 34 minutes, 26 seconds)
2025-09-13 16:54:40,855 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:54:40,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:57:52,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2198.34692 ± 1099.399
2025-09-13 16:57:52,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [67.165504, 3295.105, 3394.1023, 1004.6835, 1943.297, 2181.4429, 3361.4802, 1934.0918, 3380.1033, 1421.9955]
2025-09-13 16:57:52,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [60.0, 1000.0, 1000.0, 323.0, 585.0, 662.0, 1000.0, 603.0, 1000.0, 448.0]
2025-09-13 16:57:52,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 20 minutes, 40 seconds)
2025-09-13 17:08:02,326 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:08:02,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:11:24,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2280.62598 ± 1406.021
2025-09-13 17:11:24,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3328.198, 2083.9067, 3283.5125, 208.115, 285.09943, 3371.6726, 3339.8213, 3393.8562, 3375.8682, 136.20853]
2025-09-13 17:11:24,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 637.0, 1000.0, 93.0, 127.0, 1000.0, 1000.0, 1000.0, 1000.0, 68.0]
2025-09-13 17:11:24,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 10 minutes, 11 seconds)
2025-09-13 17:22:19,200 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:22:19,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:24:10,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1184.02271 ± 1035.891
2025-09-13 17:24:10,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [2365.737, 104.40538, 323.91806, 3398.1099, 1375.7195, 1116.1774, 117.30059, 1071.9076, 1775.4521, 191.50005]
2025-09-13 17:24:10,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [705.0, 74.0, 151.0, 1000.0, 414.0, 351.0, 80.0, 360.0, 553.0, 88.0]
2025-09-13 17:24:10,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 54 minutes, 32 seconds)
2025-09-13 17:34:00,196 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:34:00,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:37:36,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2432.00952 ± 936.820
2025-09-13 17:37:36,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3061.952, 2777.3591, 1793.5299, 1198.4305, 3367.431, 3456.962, 2710.3328, 1862.3202, 3406.9778, 684.80035]
2025-09-13 17:37:36,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [898.0, 813.0, 560.0, 399.0, 1000.0, 1000.0, 801.0, 558.0, 1000.0, 237.0]
2025-09-13 17:37:36,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (2432.01) for latency ExtremeSparseL4U32
2025-09-13 17:37:36,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 44 minutes, 52 seconds)
2025-09-13 17:47:37,975 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:47:37,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:49:41,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1343.23047 ± 1353.525
2025-09-13 17:49:41,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3100.8193, 950.78156, 64.45567, 3396.345, 1415.8947, 424.9264, 98.9697, 156.69911, 366.44342, 3456.9702]
2025-09-13 17:49:41,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [893.0, 329.0, 55.0, 1000.0, 437.0, 160.0, 62.0, 98.0, 166.0, 1000.0]
2025-09-13 17:49:41,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 30 minutes, 59 seconds)
2025-09-13 17:59:56,670 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:59:56,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:04:06,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2947.10010 ± 929.705
2025-09-13 18:04:06,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3422.4116, 350.7894, 2345.6758, 3292.116, 3453.0742, 3492.2622, 3469.111, 2897.7664, 3331.6448, 3416.1497]
2025-09-13 18:04:06,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 138.0, 711.0, 1000.0, 1000.0, 1000.0, 1000.0, 847.0, 1000.0, 1000.0]
2025-09-13 18:04:06,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1226 [INFO]: New best (2947.10) for latency ExtremeSparseL4U32
2025-09-13 18:04:06,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 19 minutes, 28 seconds)
2025-09-13 18:14:47,909 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:14:47,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:17:31,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1839.17908 ± 1167.579
2025-09-13 18:17:31,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3461.2085, 2583.9678, 1065.869, 451.8518, 3414.2725, 2787.9285, 470.39664, 1793.7494, 232.14844, 2130.399]
2025-09-13 18:17:31,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 763.0, 356.0, 175.0, 1000.0, 813.0, 195.0, 555.0, 118.0, 639.0]
2025-09-13 18:17:31,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 6 minutes, 6 seconds)
2025-09-13 18:27:12,522 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:27:12,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:30:43,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2339.51123 ± 1218.013
2025-09-13 18:30:43,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [55.71323, 1124.1042, 3380.0427, 3381.028, 3349.4019, 2944.2383, 3277.0164, 782.5262, 3324.2803, 1776.7627]
2025-09-13 18:30:43,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [54.0, 392.0, 1000.0, 1000.0, 1000.0, 903.0, 1000.0, 265.0, 1000.0, 543.0]
2025-09-13 18:30:43,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 53 minutes, 14 seconds)
2025-09-13 18:41:10,919 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:41:10,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:43:30,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1510.96240 ± 1224.098
2025-09-13 18:43:30,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [110.35585, 3255.429, 593.0847, 3362.7642, 2836.6316, 1096.8473, 288.73373, 2169.294, 1213.2542, 183.22995]
2025-09-13 18:43:30,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [59.0, 1000.0, 213.0, 1000.0, 839.0, 366.0, 124.0, 675.0, 405.0, 83.0]
2025-09-13 18:43:31,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 39 minutes, 32 seconds)
2025-09-13 18:53:36,909 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:53:36,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:56:51,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2306.57471 ± 1182.146
2025-09-13 18:56:51,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [925.97314, 2119.378, 3236.691, 3137.234, 2301.4478, 560.89374, 3452.1538, 433.93323, 3476.7156, 3421.3274]
2025-09-13 18:56:51,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [291.0, 619.0, 937.0, 917.0, 681.0, 196.0, 1000.0, 160.0, 1000.0, 1000.0]
2025-09-13 18:56:51,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 26 minutes, 52 seconds)
2025-09-13 19:07:25,472 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:07:25,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:11:06,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 2495.34937 ± 1163.871
2025-09-13 19:11:06,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [3377.033, 2237.5903, 553.45294, 297.65015, 3298.2725, 3352.1106, 1778.2072, 3324.607, 3361.0708, 3373.5002]
2025-09-13 19:11:06,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 683.0, 215.0, 130.0, 1000.0, 1000.0, 543.0, 1000.0, 1000.0, 1000.0]
2025-09-13 19:11:06,719 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 13 minutes, 24 seconds)
2025-09-13 19:21:29,556 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:21:29,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:23:35,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1221 [DEBUG]: Total Reward: 1406.66846 ± 1278.264
2025-09-13 19:23:35,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1222 [DEBUG]: All rewards: [216.52322, 142.7748, 851.12134, 935.89166, 182.83533, 3413.157, 3198.5027, 1117.834, 726.5364, 3281.5076]
2025-09-13 19:23:35,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1223 [DEBUG]: All trajectory lengths: [97.0, 88.0, 279.0, 323.0, 83.0, 1000.0, 943.0, 355.0, 241.0, 1000.0]
2025-09-13 19:23:35,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-hopper):1251 [DEBUG]: Training session finished
