2025-09-13 02:48:01,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc15-hopper/ExtremeSparseL4U32-mbpac_memdelay
2025-09-13 02:48:01,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc15-hopper/ExtremeSparseL4U32-mbpac_memdelay
2025-09-13 02:48:01,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14dcbd759550>}
2025-09-13 02:48:01,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1111 [DEBUG]: using device: cuda
2025-09-13 02:48:01,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1133 [INFO]: Creating new trainer
2025-09-13 02:48:01,257 baseline-mbpac-noiseperc15-hopper:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-09-13 02:48:01,257 baseline-mbpac-noiseperc15-hopper:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-13 02:48:01,264 baseline-mbpac-noiseperc15-hopper:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=3, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-13 02:48:02,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1194 [DEBUG]: Starting training session...
2025-09-13 02:48:02,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 1/100
2025-09-13 02:58:44,173 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 02:58:44,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 02:58:58,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 80.23401 ± 31.365
2025-09-13 02:58:58,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [50.726982, 64.310036, 95.93605, 68.4754, 58.238724, 82.321075, 152.55617, 39.863697, 111.48457, 78.42737]
2025-09-13 02:58:58,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [30.0, 37.0, 53.0, 52.0, 47.0, 45.0, 73.0, 26.0, 59.0, 43.0]
2025-09-13 02:58:58,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (80.23) for latency ExtremeSparseL4U32
2025-09-13 02:58:58,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 18 hours, 3 minutes, 2 seconds)
2025-09-13 03:09:31,594 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:09:31,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:09:57,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 147.61786 ± 67.998
2025-09-13 03:09:57,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [269.7516, 82.40765, 210.04926, 121.295815, 235.29912, 58.140377, 116.08228, 182.5555, 106.995514, 93.60154]
2025-09-13 03:09:57,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [144.0, 58.0, 113.0, 71.0, 111.0, 41.0, 76.0, 100.0, 70.0, 57.0]
2025-09-13 03:09:57,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (147.62) for latency ExtremeSparseL4U32
2025-09-13 03:09:57,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 17 hours, 53 minutes, 58 seconds)
2025-09-13 03:20:34,831 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:20:34,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:20:54,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 110.50077 ± 57.911
2025-09-13 03:20:54,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [73.4597, 175.51385, 70.22733, 56.019737, 123.42906, 128.24188, 138.78256, 49.792465, 234.30147, 55.239616]
2025-09-13 03:20:54,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [50.0, 103.0, 48.0, 36.0, 66.0, 91.0, 78.0, 33.0, 112.0, 40.0]
2025-09-13 03:20:54,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 17 hours, 43 minutes, 10 seconds)
2025-09-13 03:31:27,465 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:31:27,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:31:54,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 119.80418 ± 73.332
2025-09-13 03:31:54,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [207.45683, 117.681526, 110.39122, 70.43229, 14.968648, 154.5085, 71.78014, 90.84948, 282.4045, 77.56862]
2025-09-13 03:31:54,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [141.0, 68.0, 88.0, 43.0, 18.0, 103.0, 44.0, 60.0, 246.0, 46.0]
2025-09-13 03:31:54,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 17 hours, 32 minutes, 51 seconds)
2025-09-13 03:42:24,826 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:42:24,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:42:50,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 166.07704 ± 104.447
2025-09-13 03:42:50,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [147.41367, 62.460407, 116.00377, 229.69505, 334.94443, 75.02118, 318.2474, 257.9159, 46.245003, 72.82365]
2025-09-13 03:42:50,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [105.0, 41.0, 66.0, 122.0, 136.0, 47.0, 138.0, 116.0, 33.0, 45.0]
2025-09-13 03:42:50,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (166.08) for latency ExtremeSparseL4U32
2025-09-13 03:42:50,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 17 hours, 21 minutes, 26 seconds)
2025-09-13 03:53:14,827 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 03:53:14,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 03:53:41,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 161.92238 ± 83.968
2025-09-13 03:53:41,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [238.41963, 100.48456, 20.732933, 287.4825, 142.31647, 98.68423, 255.8435, 247.36266, 107.942055, 119.955154]
2025-09-13 03:53:41,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [115.0, 61.0, 20.0, 143.0, 76.0, 68.0, 140.0, 123.0, 61.0, 69.0]
2025-09-13 03:53:41,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 17 hours, 8 minutes, 37 seconds)
2025-09-13 04:04:08,950 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:04:08,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:04:38,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 174.49294 ± 93.861
2025-09-13 04:04:38,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [125.81498, 177.52832, 305.6934, 182.37009, 125.377045, 105.77572, 121.07257, 385.7189, 58.080273, 157.49803]
2025-09-13 04:04:38,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [75.0, 93.0, 141.0, 105.0, 74.0, 62.0, 71.0, 216.0, 35.0, 108.0]
2025-09-13 04:04:38,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (174.49) for latency ExtremeSparseL4U32
2025-09-13 04:04:38,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 16 hours, 57 minutes, 7 seconds)
2025-09-13 04:14:54,315 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:14:54,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:15:20,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 154.29727 ± 149.373
2025-09-13 04:15:20,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [361.26096, 21.406363, 15.951094, 10.626458, 19.237051, 22.346388, 169.61401, 410.80853, 272.18567, 239.53612]
2025-09-13 04:15:20,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [170.0, 30.0, 20.0, 14.0, 26.0, 28.0, 110.0, 201.0, 131.0, 151.0]
2025-09-13 04:15:20,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 16 hours, 41 minutes, 26 seconds)
2025-09-13 04:25:51,127 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:25:51,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:26:24,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 229.36800 ± 123.140
2025-09-13 04:26:24,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [303.10934, 334.41443, 369.9975, 261.1232, 13.924339, 22.684282, 147.54692, 306.07584, 197.50996, 337.29398]
2025-09-13 04:26:24,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [145.0, 149.0, 164.0, 129.0, 15.0, 31.0, 81.0, 142.0, 110.0, 148.0]
2025-09-13 04:26:24,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (229.37) for latency ExtremeSparseL4U32
2025-09-13 04:26:24,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 16 hours, 31 minutes, 59 seconds)
2025-09-13 04:36:44,961 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:36:44,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:37:18,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 229.39175 ± 115.892
2025-09-13 04:37:18,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [405.0197, 445.01004, 109.04588, 127.899445, 205.79083, 247.17004, 183.92778, 286.59457, 68.902504, 214.55664]
2025-09-13 04:37:18,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [168.0, 198.0, 58.0, 67.0, 108.0, 128.0, 100.0, 132.0, 42.0, 111.0]
2025-09-13 04:37:18,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (229.39) for latency ExtremeSparseL4U32
2025-09-13 04:37:18,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 16 hours, 20 minutes, 25 seconds)
2025-09-13 04:47:45,230 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:47:45,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:48:10,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 173.74702 ± 114.586
2025-09-13 04:48:10,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [297.57733, 274.76968, 49.84664, 82.14788, 69.84799, 100.066055, 153.56364, 169.6443, 423.0385, 116.96802]
2025-09-13 04:48:10,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [139.0, 118.0, 30.0, 52.0, 44.0, 59.0, 76.0, 86.0, 161.0, 66.0]
2025-09-13 04:48:10,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 16 hours, 9 minutes, 45 seconds)
2025-09-13 04:58:39,009 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 04:58:39,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 04:59:11,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 219.35599 ± 127.726
2025-09-13 04:59:11,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [127.18568, 329.2136, 165.6648, 258.65488, 128.91754, 265.70612, 159.99718, 508.51642, 231.67058, 18.032978]
2025-09-13 04:59:11,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [69.0, 138.0, 87.0, 120.0, 67.0, 143.0, 83.0, 214.0, 119.0, 19.0]
2025-09-13 04:59:11,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 16 hours, 10 seconds)
2025-09-13 05:09:41,636 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:09:41,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:10:15,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 269.01068 ± 219.978
2025-09-13 05:10:15,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [132.80322, 10.58891, 88.093895, 50.74919, 494.76886, 215.74876, 70.21094, 563.7182, 532.11444, 531.31024]
2025-09-13 05:10:15,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [75.0, 16.0, 56.0, 40.0, 195.0, 101.0, 46.0, 224.0, 202.0, 187.0]
2025-09-13 05:10:15,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (269.01) for latency ExtremeSparseL4U32
2025-09-13 05:10:15,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 15 hours, 55 minutes, 41 seconds)
2025-09-13 05:20:27,052 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:20:27,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:20:52,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 175.20181 ± 133.501
2025-09-13 05:20:52,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [145.26585, 119.57667, 11.290343, 155.09581, 485.7367, 144.47098, 309.54443, 120.48685, 19.575089, 240.97542]
2025-09-13 05:20:52,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [72.0, 64.0, 18.0, 83.0, 201.0, 73.0, 135.0, 63.0, 24.0, 116.0]
2025-09-13 05:20:52,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 15 hours, 36 minutes, 52 seconds)
2025-09-13 05:31:18,806 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:31:18,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:32:04,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 380.44757 ± 186.100
2025-09-13 05:32:04,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [493.5745, 96.130714, 601.18317, 680.8856, 293.5317, 132.73972, 354.09155, 396.95438, 525.70636, 229.67813]
2025-09-13 05:32:04,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [199.0, 55.0, 213.0, 246.0, 142.0, 71.0, 163.0, 157.0, 199.0, 110.0]
2025-09-13 05:32:04,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (380.45) for latency ExtremeSparseL4U32
2025-09-13 05:32:04,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 15 hours, 31 minutes, 2 seconds)
2025-09-13 05:42:36,653 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:42:36,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:43:19,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 337.99655 ± 134.282
2025-09-13 05:43:19,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [474.91516, 390.66595, 164.78256, 341.85742, 317.53757, 423.22058, 573.17255, 89.45882, 275.2306, 329.12433]
2025-09-13 05:43:19,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [189.0, 163.0, 89.0, 151.0, 134.0, 169.0, 225.0, 50.0, 120.0, 145.0]
2025-09-13 05:43:19,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 15 hours, 26 minutes, 39 seconds)
2025-09-13 05:53:35,967 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:53:35,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:54:06,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 231.12178 ± 180.561
2025-09-13 05:54:06,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [277.17584, 30.658836, 26.79093, 20.591385, 284.0158, 538.7231, 517.17535, 101.42177, 247.25136, 267.41345]
2025-09-13 05:54:06,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [119.0, 32.0, 30.0, 25.0, 139.0, 213.0, 192.0, 58.0, 115.0, 122.0]
2025-09-13 05:54:06,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 15 hours, 11 minutes, 45 seconds)
2025-09-13 06:04:33,412 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:04:33,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:05:33,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 459.39728 ± 301.645
2025-09-13 06:05:33,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [82.64974, 638.0039, 910.8443, 511.97534, 153.67247, 430.15543, 371.18954, 1021.6855, 158.78091, 315.01596]
2025-09-13 06:05:33,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [48.0, 238.0, 383.0, 222.0, 90.0, 184.0, 164.0, 447.0, 82.0, 146.0]
2025-09-13 06:05:33,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (459.40) for latency ExtremeSparseL4U32
2025-09-13 06:05:33,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 15 hours, 6 minutes, 46 seconds)
2025-09-13 06:16:04,694 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:16:04,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:16:53,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 368.21066 ± 366.394
2025-09-13 06:16:53,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1090.8328, 19.007694, 22.994183, 185.64505, 385.2152, 323.65884, 44.272396, 362.24704, 1016.28815, 231.94553]
2025-09-13 06:16:53,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [450.0, 19.0, 23.0, 106.0, 162.0, 142.0, 34.0, 175.0, 357.0, 113.0]
2025-09-13 06:16:53,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 15 hours, 7 minutes, 18 seconds)
2025-09-13 06:27:12,656 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:27:12,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:27:41,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 214.53548 ± 183.581
2025-09-13 06:27:41,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [263.37866, 96.46221, 455.99203, 502.91846, 175.18211, 26.359556, 15.902762, 457.49277, 24.323208, 127.34267]
2025-09-13 06:27:41,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [114.0, 53.0, 187.0, 174.0, 83.0, 25.0, 25.0, 184.0, 26.0, 78.0]
2025-09-13 06:27:41,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 14 hours, 49 minutes, 45 seconds)
2025-09-13 06:38:31,895 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:38:31,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:39:19,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 361.07449 ± 212.994
2025-09-13 06:39:19,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [13.7277355, 410.06433, 809.6319, 159.29503, 562.93274, 421.6607, 230.17645, 452.13867, 229.58467, 321.5327]
2025-09-13 06:39:19,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 173.0, 312.0, 82.0, 221.0, 193.0, 111.0, 207.0, 102.0, 151.0]
2025-09-13 06:39:19,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 14 hours, 44 minutes, 48 seconds)
2025-09-13 06:49:27,810 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:49:27,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:49:56,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 199.06265 ± 209.084
2025-09-13 06:49:56,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [195.47131, 65.505, 15.00153, 324.15076, 17.439838, 753.5541, 170.97629, 54.155582, 258.5986, 135.7734]
2025-09-13 06:49:56,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [95.0, 39.0, 26.0, 137.0, 21.0, 314.0, 84.0, 40.0, 124.0, 83.0]
2025-09-13 06:49:56,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 14 hours, 30 minutes, 53 seconds)
2025-09-13 07:00:15,841 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:00:15,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:01:09,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 423.92520 ± 391.025
2025-09-13 07:01:09,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [252.64359, 225.65953, 34.20476, 22.483152, 1072.9491, 327.57483, 1223.804, 179.94669, 539.74786, 360.23892]
2025-09-13 07:01:09,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [106.0, 106.0, 46.0, 25.0, 410.0, 140.0, 474.0, 83.0, 235.0, 157.0]
2025-09-13 07:01:09,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 14 hours, 16 minutes, 12 seconds)
2025-09-13 07:11:31,849 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:11:31,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:12:09,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 301.74908 ± 424.778
2025-09-13 07:12:09,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [394.34644, 1478.2031, 66.96623, 21.434708, 178.7824, 17.63288, 179.25877, 536.35065, 13.82429, 130.69148]
2025-09-13 07:12:09,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [151.0, 514.0, 47.0, 28.0, 87.0, 24.0, 89.0, 208.0, 19.0, 68.0]
2025-09-13 07:12:09,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 14 hours, 4 seconds)
2025-09-13 07:22:42,396 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:22:42,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:23:39,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 467.77979 ± 267.215
2025-09-13 07:23:39,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [694.8839, 663.69, 322.7861, 318.9373, 713.54456, 839.9606, 377.20377, 91.67612, 638.2053, 16.910006]
2025-09-13 07:23:39,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [261.0, 251.0, 136.0, 138.0, 274.0, 336.0, 156.0, 68.0, 250.0, 22.0]
2025-09-13 07:23:39,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (467.78) for latency ExtremeSparseL4U32
2025-09-13 07:23:39,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 13 hours, 59 minutes, 37 seconds)
2025-09-13 07:34:06,831 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:34:06,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:35:07,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 494.34415 ± 448.379
2025-09-13 07:35:07,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [256.68094, 281.06686, 139.7923, 583.3638, 452.16687, 954.7775, 211.56491, 1642.9353, 229.83397, 191.25854]
2025-09-13 07:35:07,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [199.0, 121.0, 71.0, 242.0, 181.0, 358.0, 97.0, 563.0, 109.0, 88.0]
2025-09-13 07:35:07,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (494.34) for latency ExtremeSparseL4U32
2025-09-13 07:35:07,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 13 hours, 45 minutes, 55 seconds)
2025-09-13 07:45:43,313 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:45:43,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:46:23,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 309.72168 ± 329.519
2025-09-13 07:46:23,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [138.41855, 21.607058, 107.47447, 280.90878, 1151.8289, 407.26685, 219.12749, 96.04939, 56.467022, 618.06824]
2025-09-13 07:46:23,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [89.0, 30.0, 60.0, 130.0, 418.0, 165.0, 98.0, 52.0, 53.0, 227.0]
2025-09-13 07:46:23,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 13 hours, 44 minutes, 11 seconds)
2025-09-13 07:56:42,384 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:56:42,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:57:24,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 326.99933 ± 294.706
2025-09-13 07:57:24,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [476.0674, 625.7789, 153.94841, 14.840541, 347.5882, 12.007847, 21.752987, 951.0502, 179.6998, 487.2591]
2025-09-13 07:57:24,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [184.0, 232.0, 75.0, 24.0, 165.0, 19.0, 29.0, 371.0, 83.0, 188.0]
2025-09-13 07:57:24,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 13 hours, 29 minutes, 57 seconds)
2025-09-13 08:07:45,138 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:07:45,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:08:41,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 484.76089 ± 348.306
2025-09-13 08:08:41,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [369.5192, 86.43277, 146.91716, 1121.519, 825.3851, 10.311529, 793.1116, 616.9647, 260.25858, 617.18884]
2025-09-13 08:08:41,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [151.0, 50.0, 76.0, 386.0, 302.0, 14.0, 287.0, 236.0, 112.0, 230.0]
2025-09-13 08:08:41,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 13 hours, 22 minutes, 46 seconds)
2025-09-13 08:19:09,813 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:19:09,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:19:44,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 280.52881 ± 240.893
2025-09-13 08:19:44,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [19.989935, 461.8355, 10.676488, 10.815821, 218.13673, 100.76061, 562.8219, 725.14655, 272.01874, 423.0857]
2025-09-13 08:19:44,417 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [29.0, 184.0, 16.0, 13.0, 107.0, 72.0, 206.0, 240.0, 114.0, 161.0]
2025-09-13 08:19:44,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 13 hours, 5 minutes, 2 seconds)
2025-09-13 08:30:12,856 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:30:12,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:31:21,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 564.93768 ± 367.967
2025-09-13 08:31:21,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1321.7255, 306.7743, 793.9814, 652.5442, 279.4201, 643.2213, 135.72716, 9.816163, 795.8563, 710.3107]
2025-09-13 08:31:21,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [499.0, 183.0, 301.0, 248.0, 124.0, 260.0, 69.0, 15.0, 297.0, 247.0]
2025-09-13 08:31:21,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (564.94) for latency ExtremeSparseL4U32
2025-09-13 08:31:21,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 12 hours, 55 minutes, 49 seconds)
2025-09-13 08:42:35,001 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:42:35,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:43:08,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 276.70572 ± 293.164
2025-09-13 08:43:08,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [179.49785, 640.5615, 348.24255, 16.530788, 18.473999, 14.458332, 746.3479, 705.4192, 71.89287, 25.632147]
2025-09-13 08:43:08,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [88.0, 236.0, 143.0, 22.0, 24.0, 22.0, 261.0, 243.0, 43.0, 31.0]
2025-09-13 08:43:08,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 12 hours, 51 minutes, 45 seconds)
2025-09-13 08:52:42,094 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:52:42,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:53:30,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 443.11530 ± 466.736
2025-09-13 08:53:30,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [709.95544, 1558.0339, 435.3775, 762.0298, 630.81445, 89.79531, 32.17135, 16.415169, 15.892415, 180.66739]
2025-09-13 08:53:30,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [248.0, 527.0, 172.0, 267.0, 236.0, 51.0, 27.0, 22.0, 21.0, 84.0]
2025-09-13 08:53:30,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 12 hours, 31 minutes, 55 seconds)
2025-09-13 09:03:58,223 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:03:58,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:04:56,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 503.08359 ± 508.952
2025-09-13 09:04:56,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [186.00714, 1267.9495, 679.99854, 85.59014, 56.734463, 178.13515, 1583.5892, 476.3625, 27.758984, 488.7108]
2025-09-13 09:04:56,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 427.0, 237.0, 53.0, 49.0, 84.0, 564.0, 182.0, 27.0, 226.0]
2025-09-13 09:04:56,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 12 hours, 22 minutes, 40 seconds)
2025-09-13 09:15:14,338 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:15:14,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:15:54,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 327.81403 ± 294.556
2025-09-13 09:15:54,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [11.6038265, 23.398315, 77.335594, 145.7432, 460.44843, 379.2215, 346.42133, 932.1243, 736.40814, 165.43579]
2025-09-13 09:15:54,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 24.0, 55.0, 77.0, 178.0, 148.0, 138.0, 307.0, 286.0, 83.0]
2025-09-13 09:15:54,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 12 hours, 10 minutes, 5 seconds)
2025-09-13 09:26:31,051 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:26:31,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:27:40,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 603.27576 ± 499.740
2025-09-13 09:27:40,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1322.1107, 18.574183, 624.9815, 581.2803, 411.58322, 18.40869, 95.28876, 1597.0332, 658.1807, 705.3159]
2025-09-13 09:27:40,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [475.0, 20.0, 237.0, 229.0, 166.0, 22.0, 61.0, 578.0, 231.0, 274.0]
2025-09-13 09:27:40,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (603.28) for latency ExtremeSparseL4U32
2025-09-13 09:27:40,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 12 hours, 52 seconds)
2025-09-13 09:38:11,005 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:38:11,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:38:42,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 234.68904 ± 309.241
2025-09-13 09:38:42,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [15.526873, 16.196384, 12.212303, 1024.2046, 66.11743, 19.194891, 561.9534, 234.44685, 238.03116, 159.00626]
2025-09-13 09:38:42,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 29.0, 16.0, 391.0, 40.0, 24.0, 231.0, 109.0, 109.0, 81.0]
2025-09-13 09:38:42,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 11 hours, 40 minutes, 11 seconds)
2025-09-13 09:49:07,147 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:49:07,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:49:57,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 401.06995 ± 643.993
2025-09-13 09:49:57,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [102.219734, 13.763196, 20.703905, 142.77576, 170.53407, 355.3848, 2197.363, 848.64343, 17.525652, 141.78598]
2025-09-13 09:49:57,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [60.0, 21.0, 29.0, 71.0, 82.0, 147.0, 800.0, 315.0, 27.0, 95.0]
2025-09-13 09:49:57,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 11 hours, 39 minutes, 54 seconds)
2025-09-13 10:00:11,239 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:00:11,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:01:01,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 410.57013 ± 415.151
2025-09-13 10:01:01,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [37.996025, 714.3534, 175.09637, 661.96594, 1431.2455, 80.89411, 75.70858, 233.81647, 536.94104, 157.68355]
2025-09-13 10:01:01,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [52.0, 292.0, 85.0, 240.0, 494.0, 53.0, 53.0, 112.0, 228.0, 75.0]
2025-09-13 10:01:01,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 11 hours, 24 minutes, 10 seconds)
2025-09-13 10:11:29,599 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:11:29,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:12:30,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 526.65765 ± 334.338
2025-09-13 10:12:30,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [847.6487, 199.3715, 705.1114, 808.01807, 430.0802, 1228.6445, 267.5206, 205.15326, 218.05133, 356.97745]
2025-09-13 10:12:30,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [279.0, 91.0, 234.0, 268.0, 173.0, 427.0, 132.0, 135.0, 99.0, 189.0]
2025-09-13 10:12:30,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 11 hours, 19 minutes, 21 seconds)
2025-09-13 10:22:55,699 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:22:55,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:24:18,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 745.29919 ± 499.596
2025-09-13 10:24:18,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [506.7014, 1549.037, 1665.5455, 482.94604, 296.95163, 909.5288, 77.355415, 766.5911, 882.0606, 316.27426]
2025-09-13 10:24:18,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [199.0, 521.0, 570.0, 192.0, 130.0, 330.0, 55.0, 334.0, 287.0, 131.0]
2025-09-13 10:24:18,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (745.30) for latency ExtremeSparseL4U32
2025-09-13 10:24:18,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 11 hours, 8 minutes, 13 seconds)
2025-09-13 10:34:47,498 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:34:47,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:36:11,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 757.59827 ± 655.335
2025-09-13 10:36:11,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [621.0789, 2484.6204, 1258.6288, 244.1881, 851.91864, 505.95227, 669.74884, 199.96214, 578.8933, 160.99132]
2025-09-13 10:36:11,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [222.0, 917.0, 433.0, 108.0, 270.0, 194.0, 252.0, 95.0, 215.0, 84.0]
2025-09-13 10:36:11,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (757.60) for latency ExtremeSparseL4U32
2025-09-13 10:36:11,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 11 hours, 6 minutes, 44 seconds)
2025-09-13 10:46:32,019 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:46:32,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:47:16,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 387.18719 ± 577.212
2025-09-13 10:47:16,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1978.157, 223.86479, 810.84186, 155.68315, 26.576822, 13.685997, 37.047436, 294.38394, 20.838903, 310.79214]
2025-09-13 10:47:16,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [671.0, 101.0, 295.0, 78.0, 24.0, 16.0, 39.0, 130.0, 23.0, 137.0]
2025-09-13 10:47:16,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 10 hours, 53 minutes, 25 seconds)
2025-09-13 10:57:41,869 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:57:41,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:58:45,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 562.65399 ± 500.997
2025-09-13 10:58:45,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [48.80477, 159.97461, 152.05437, 710.0073, 67.99542, 1028.9614, 1134.8538, 229.26524, 1558.6835, 535.94006]
2025-09-13 10:58:45,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [47.0, 75.0, 74.0, 250.0, 63.0, 395.0, 390.0, 106.0, 538.0, 203.0]
2025-09-13 10:58:45,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 10 hours, 46 minutes, 36 seconds)
2025-09-13 11:09:37,443 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:09:37,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:10:43,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 624.43567 ± 426.967
2025-09-13 11:10:43,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [253.13524, 704.5596, 281.5255, 1577.9874, 961.6758, 500.67935, 1028.4155, 506.96365, 143.72734, 285.68716]
2025-09-13 11:10:43,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [119.0, 244.0, 118.0, 506.0, 326.0, 179.0, 329.0, 198.0, 80.0, 115.0]
2025-09-13 11:10:44,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 10 hours, 40 minutes, 24 seconds)
2025-09-13 11:21:04,758 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:21:04,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:22:03,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 488.85214 ± 430.971
2025-09-13 11:22:03,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [683.56384, 269.30548, 493.7866, 163.46185, 93.573, 482.55768, 1488.5093, 240.26382, 13.498105, 960.002]
2025-09-13 11:22:03,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [226.0, 116.0, 225.0, 101.0, 58.0, 249.0, 533.0, 111.0, 15.0, 336.0]
2025-09-13 11:22:03,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 10 hours, 23 minutes, 51 seconds)
2025-09-13 11:32:49,766 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:32:49,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:34:34,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 963.75116 ± 726.230
2025-09-13 11:34:34,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1007.8075, 428.57925, 371.34787, 25.23837, 764.5809, 1376.8722, 1606.191, 2283.9412, 1715.9623, 56.991135]
2025-09-13 11:34:34,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [366.0, 174.0, 152.0, 24.0, 324.0, 470.0, 570.0, 779.0, 618.0, 44.0]
2025-09-13 11:34:34,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (963.75) for latency ExtremeSparseL4U32
2025-09-13 11:34:34,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 10 hours, 18 minutes, 54 seconds)
2025-09-13 11:44:30,114 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:44:30,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:45:25,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 536.50549 ± 479.548
2025-09-13 11:45:25,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1398.3761, 870.80005, 13.958874, 63.29692, 246.0556, 506.3, 157.76709, 895.7102, 1145.187, 67.6024]
2025-09-13 11:45:25,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [443.0, 310.0, 18.0, 52.0, 110.0, 181.0, 77.0, 284.0, 359.0, 40.0]
2025-09-13 11:45:25,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 10 hours, 4 minutes, 44 seconds)
2025-09-13 11:55:56,193 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:55:56,201 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:57:19,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 779.15222 ± 580.045
2025-09-13 11:57:19,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1779.7336, 164.79485, 15.3873005, 21.518787, 1072.1797, 679.3032, 606.766, 1487.3942, 738.3497, 1226.0955]
2025-09-13 11:57:19,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [607.0, 79.0, 17.0, 22.0, 377.0, 253.0, 216.0, 507.0, 242.0, 431.0]
2025-09-13 11:57:19,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 9 hours, 57 minutes, 17 seconds)
2025-09-13 12:07:35,488 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:07:35,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:08:07,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 257.06082 ± 259.595
2025-09-13 12:08:07,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [183.58635, 486.22336, 9.555243, 11.463104, 62.820045, 241.33005, 165.31668, 33.18182, 575.656, 801.4754]
2025-09-13 12:08:07,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [95.0, 191.0, 29.0, 17.0, 57.0, 107.0, 85.0, 28.0, 208.0, 250.0]
2025-09-13 12:08:07,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 9 hours, 33 minutes, 55 seconds)
2025-09-13 12:18:34,299 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:18:34,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:19:29,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 482.18408 ± 431.308
2025-09-13 12:19:29,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [682.3739, 10.351968, 72.48243, 19.471964, 834.1111, 200.05455, 492.49933, 1429.862, 308.806, 771.82764]
2025-09-13 12:19:29,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [242.0, 14.0, 47.0, 22.0, 310.0, 90.0, 185.0, 501.0, 155.0, 266.0]
2025-09-13 12:19:29,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 9 hours, 22 minutes, 44 seconds)
2025-09-13 12:30:04,342 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:30:04,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:31:00,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 480.79425 ± 435.146
2025-09-13 12:31:00,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1398.4252, 314.61578, 24.281258, 531.24603, 936.9512, 17.122297, 265.81168, 21.272964, 843.79376, 454.42184]
2025-09-13 12:31:00,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [490.0, 139.0, 26.0, 209.0, 331.0, 21.0, 146.0, 23.0, 309.0, 180.0]
2025-09-13 12:31:00,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 9 hours, 1 minute, 40 seconds)
2025-09-13 12:41:25,641 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:41:25,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:42:51,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 807.63586 ± 441.530
2025-09-13 12:42:51,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [928.0036, 512.84863, 140.01486, 887.3418, 212.73215, 1191.17, 1741.301, 943.2226, 772.9021, 746.82135]
2025-09-13 12:42:51,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [340.0, 184.0, 77.0, 329.0, 121.0, 382.0, 568.0, 296.0, 314.0, 258.0]
2025-09-13 12:42:51,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 8 hours, 59 minutes, 55 seconds)
2025-09-13 12:53:07,216 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:53:07,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:54:24,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 733.44598 ± 305.104
2025-09-13 12:54:24,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [693.4062, 376.38986, 706.0109, 802.0811, 950.13556, 805.1456, 350.6732, 1228.2933, 291.72543, 1130.5986]
2025-09-13 12:54:24,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [259.0, 149.0, 253.0, 255.0, 362.0, 285.0, 149.0, 407.0, 127.0, 373.0]
2025-09-13 12:54:24,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 8 hours, 45 minutes, 16 seconds)
2025-09-13 13:05:01,837 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:05:01,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:06:28,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 778.17944 ± 709.229
2025-09-13 13:06:28,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [705.0043, 1514.1421, 20.260246, 28.737577, 1298.2927, 2216.2969, 369.80188, 444.885, 17.199522, 1167.1738]
2025-09-13 13:06:28,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [268.0, 537.0, 25.0, 32.0, 468.0, 715.0, 155.0, 203.0, 32.0, 421.0]
2025-09-13 13:06:28,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 8 hours, 45 minutes, 8 seconds)
2025-09-13 13:16:51,143 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:16:51,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:18:52,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1143.73096 ± 731.140
2025-09-13 13:18:52,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1153.2646, 927.17017, 949.8438, 232.6912, 1966.8622, 276.5099, 874.4926, 2851.6523, 1036.9872, 1167.8356]
2025-09-13 13:18:52,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [436.0, 346.0, 313.0, 102.0, 680.0, 121.0, 289.0, 1000.0, 349.0, 438.0]
2025-09-13 13:18:52,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (1143.73) for latency ExtremeSparseL4U32
2025-09-13 13:18:52,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 8 hours, 42 minutes, 35 seconds)
2025-09-13 13:29:19,038 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:29:19,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:30:40,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 766.27325 ± 771.225
2025-09-13 13:30:40,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [2111.2588, 1704.5022, 322.19803, 33.499092, 26.684023, 14.598006, 887.0541, 113.535904, 699.58984, 1749.8124]
2025-09-13 13:30:40,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [726.0, 576.0, 135.0, 31.0, 27.0, 19.0, 323.0, 65.0, 240.0, 552.0]
2025-09-13 13:30:40,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 8 hours, 33 minutes, 10 seconds)
2025-09-13 13:40:58,621 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:40:58,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:41:38,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 336.69263 ± 343.818
2025-09-13 13:41:38,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [232.98157, 1312.0708, 491.3034, 248.82127, 263.6571, 204.37062, 191.91875, 159.90607, 250.62328, 11.273409]
2025-09-13 13:41:38,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [106.0, 426.0, 189.0, 112.0, 118.0, 97.0, 91.0, 78.0, 106.0, 13.0]
2025-09-13 13:41:38,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 8 hours, 13 minutes, 41 seconds)
2025-09-13 13:52:31,221 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:52:31,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:53:51,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 750.71460 ± 736.662
2025-09-13 13:53:51,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [212.87347, 2426.671, 233.1345, 1233.0524, 271.4235, 172.1731, 1673.7941, 215.33803, 659.4422, 409.2438]
2025-09-13 13:53:51,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [94.0, 842.0, 103.0, 447.0, 114.0, 86.0, 519.0, 95.0, 216.0, 158.0]
2025-09-13 13:53:51,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 8 hours, 7 minutes, 27 seconds)
2025-09-13 14:03:53,586 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:03:53,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:04:47,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 490.52643 ± 453.935
2025-09-13 14:04:47,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [371.7622, 912.5578, 353.01517, 139.05945, 50.228394, 150.00908, 1396.692, 187.24236, 204.97984, 1139.718]
2025-09-13 14:04:47,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [143.0, 311.0, 146.0, 69.0, 38.0, 73.0, 463.0, 88.0, 96.0, 365.0]
2025-09-13 14:04:47,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 7 hours, 46 minutes, 29 seconds)
2025-09-13 14:15:34,436 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:15:34,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:16:32,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 474.94565 ± 616.798
2025-09-13 14:16:32,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [434.2164, 27.27589, 17.44873, 103.53705, 2254.47, 365.85266, 507.2259, 285.60812, 477.81146, 276.01047]
2025-09-13 14:16:32,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [171.0, 31.0, 19.0, 87.0, 798.0, 186.0, 209.0, 124.0, 178.0, 119.0]
2025-09-13 14:16:32,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 7 hours, 29 minutes, 51 seconds)
2025-09-13 14:26:38,206 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:26:38,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:27:30,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 477.03290 ± 311.240
2025-09-13 14:27:30,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [170.17738, 305.51096, 132.12251, 986.74335, 524.7181, 104.260284, 804.5927, 932.46027, 404.79437, 404.94928]
2025-09-13 14:27:30,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [81.0, 129.0, 70.0, 308.0, 195.0, 72.0, 274.0, 297.0, 165.0, 154.0]
2025-09-13 14:27:30,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 7 hours, 12 minutes)
2025-09-13 14:37:57,464 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:37:57,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:39:29,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 871.49933 ± 491.714
2025-09-13 14:39:29,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1036.8306, 1195.8665, 433.2308, 422.5783, 920.85114, 1155.2701, 634.63464, 196.90405, 2000.2745, 718.55273]
2025-09-13 14:39:29,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [342.0, 413.0, 173.0, 169.0, 319.0, 372.0, 226.0, 90.0, 692.0, 261.0]
2025-09-13 14:39:29,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 7 hours, 8 minutes, 5 seconds)
2025-09-13 14:50:24,840 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:50:24,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:51:36,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 653.38635 ± 562.916
2025-09-13 14:51:36,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1533.1085, 1786.7849, 159.82758, 177.15245, 216.2454, 455.3341, 248.71902, 823.0691, 884.60333, 249.01834]
2025-09-13 14:51:36,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [510.0, 618.0, 84.0, 89.0, 98.0, 186.0, 109.0, 316.0, 282.0, 107.0]
2025-09-13 14:51:37,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 6 hours, 55 minutes, 49 seconds)
2025-09-13 15:01:43,818 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:01:43,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:02:29,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 396.87659 ± 355.792
2025-09-13 15:02:29,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1254.3434, 609.0974, 421.6652, 556.66016, 435.04495, 9.915813, 410.16458, 31.306644, 16.779446, 223.78798]
2025-09-13 15:02:29,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [412.0, 221.0, 167.0, 211.0, 175.0, 15.0, 179.0, 29.0, 20.0, 105.0]
2025-09-13 15:02:29,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 6 hours, 43 minutes, 56 seconds)
2025-09-13 15:13:05,140 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:13:05,150 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:13:43,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 317.71216 ± 366.422
2025-09-13 15:13:43,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [185.9541, 504.92484, 16.823114, 184.90338, 14.983549, 821.73047, 277.07404, 19.421017, 22.167908, 1129.1392]
2025-09-13 15:13:43,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [88.0, 180.0, 20.0, 84.0, 20.0, 295.0, 120.0, 21.0, 26.0, 441.0]
2025-09-13 15:13:43,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 6 hours, 28 minutes, 47 seconds)
2025-09-13 15:23:49,556 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:23:49,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:25:30,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 941.41180 ± 1010.154
2025-09-13 15:25:30,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [2918.6028, 945.2164, 150.4054, 2894.5806, 728.4743, 278.0279, 613.16003, 224.00189, 350.5918, 311.05753]
2025-09-13 15:25:30,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 341.0, 105.0, 1000.0, 267.0, 112.0, 193.0, 101.0, 145.0, 149.0]
2025-09-13 15:25:30,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 6 hours, 22 minutes, 43 seconds)
2025-09-13 15:35:58,820 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:35:58,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:36:50,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 438.64886 ± 522.549
2025-09-13 15:36:50,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [10.578267, 17.388847, 858.96533, 725.3472, 663.51337, 389.13348, 1672.6797, 14.353287, 16.482141, 18.047508]
2025-09-13 15:36:50,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 20.0, 322.0, 295.0, 263.0, 157.0, 566.0, 31.0, 23.0, 24.0]
2025-09-13 15:36:50,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 6 hours, 7 minutes, 4 seconds)
2025-09-13 15:47:16,336 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:47:16,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:47:49,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 243.81966 ± 213.458
2025-09-13 15:47:49,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [121.97891, 307.5101, 713.90076, 34.36823, 501.15167, 186.50233, 23.363358, 239.71992, 292.58682, 17.114325]
2025-09-13 15:47:49,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [64.0, 133.0, 256.0, 31.0, 242.0, 97.0, 26.0, 102.0, 121.0, 21.0]
2025-09-13 15:47:49,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 5 hours, 48 minutes, 26 seconds)
2025-09-13 15:58:18,784 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:58:18,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:59:11,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 450.90723 ± 608.343
2025-09-13 15:59:11,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [17.485651, 221.49469, 136.05461, 1332.7311, 114.804565, 1762.3077, 13.659908, 18.523232, 877.66046, 14.350121]
2025-09-13 15:59:11,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 120.0, 67.0, 461.0, 63.0, 614.0, 17.0, 20.0, 331.0, 20.0]
2025-09-13 15:59:11,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 5 hours, 40 minutes, 14 seconds)
2025-09-13 16:10:12,021 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:10:12,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:10:43,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 249.52116 ± 212.748
2025-09-13 16:10:43,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [426.6002, 177.88351, 448.66525, 654.29486, 14.167547, 16.556168, 113.70478, 12.598507, 414.65186, 216.08907]
2025-09-13 16:10:43,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [164.0, 84.0, 169.0, 258.0, 17.0, 20.0, 82.0, 15.0, 164.0, 96.0]
2025-09-13 16:10:43,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 5 hours, 30 minutes, 37 seconds)
2025-09-13 16:20:48,607 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:20:48,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:22:16,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 837.93341 ± 841.018
2025-09-13 16:22:16,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [891.039, 1281.8212, 386.93234, 17.60096, 1022.80774, 396.55026, 238.36818, 30.141682, 3001.357, 1112.7163]
2025-09-13 16:22:16,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [314.0, 441.0, 148.0, 26.0, 309.0, 149.0, 104.0, 33.0, 1000.0, 378.0]
2025-09-13 16:22:16,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 5 hours, 17 minutes, 54 seconds)
2025-09-13 16:32:36,451 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:32:36,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:34:00,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 774.19043 ± 676.168
2025-09-13 16:34:00,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1845.6451, 1406.7993, 28.424858, 23.864126, 205.27733, 1307.1677, 1607.4177, 657.3956, 641.679, 18.23311]
2025-09-13 16:34:00,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [631.0, 493.0, 29.0, 29.0, 115.0, 481.0, 499.0, 232.0, 273.0, 33.0]
2025-09-13 16:34:00,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 8 minutes, 41 seconds)
2025-09-13 16:44:50,963 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:44:50,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:45:30,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 329.28937 ± 473.138
2025-09-13 16:45:30,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [13.803521, 32.520985, 13.29743, 20.382824, 11.688423, 1260.405, 568.62036, 174.75003, 1173.8357, 23.589272]
2025-09-13 16:45:30,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 30.0, 21.0, 23.0, 15.0, 461.0, 240.0, 83.0, 414.0, 33.0]
2025-09-13 16:45:30,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 4 hours, 59 minutes, 59 seconds)
2025-09-13 16:55:55,964 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:55:55,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:57:06,908 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 629.05676 ± 585.240
2025-09-13 16:57:06,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1229.2222, 248.1698, 119.767136, 893.34607, 21.348408, 1851.91, 17.271227, 816.81305, 157.33896, 935.38116]
2025-09-13 16:57:06,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [434.0, 104.0, 85.0, 334.0, 29.0, 666.0, 18.0, 329.0, 75.0, 294.0]
2025-09-13 16:57:06,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 4 hours, 49 minutes, 35 seconds)
2025-09-13 17:07:19,915 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:07:19,918 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:09:32,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1272.05408 ± 917.393
2025-09-13 17:09:32,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [2643.646, 1258.7864, 744.6243, 647.7201, 139.39854, 1510.6509, 159.15071, 1703.2648, 2997.0818, 916.217]
2025-09-13 17:09:32,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [904.0, 426.0, 278.0, 235.0, 75.0, 551.0, 75.0, 524.0, 1000.0, 319.0]
2025-09-13 17:09:32,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (1272.05) for latency ExtremeSparseL4U32
2025-09-13 17:09:32,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 4 hours, 42 minutes, 19 seconds)
2025-09-13 17:20:02,379 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:20:02,381 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:21:48,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 998.57678 ± 814.323
2025-09-13 17:21:48,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [279.32996, 1335.0562, 324.67307, 1443.4235, 2873.2898, 136.86093, 1278.9977, 1047.5651, 23.517029, 1243.0546]
2025-09-13 17:21:48,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [126.0, 461.0, 132.0, 482.0, 932.0, 70.0, 470.0, 361.0, 25.0, 451.0]
2025-09-13 17:21:48,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 4 hours, 33 minutes, 53 seconds)
2025-09-13 17:31:58,705 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:31:58,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:33:19,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 740.53546 ± 741.029
2025-09-13 17:33:19,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [496.38904, 570.6688, 1739.0177, 812.70197, 117.72325, 11.26893, 17.8261, 735.6695, 465.6228, 2438.466]
2025-09-13 17:33:19,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [188.0, 223.0, 591.0, 292.0, 81.0, 15.0, 18.0, 290.0, 180.0, 860.0]
2025-09-13 17:33:19,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 21 minutes)
2025-09-13 17:43:42,275 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:43:42,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:44:27,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 383.23679 ± 479.483
2025-09-13 17:44:27,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [419.8153, 475.29877, 267.083, 246.85394, 70.85232, 15.829148, 18.505192, 22.745264, 1692.1168, 603.2683]
2025-09-13 17:44:27,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [163.0, 181.0, 119.0, 111.0, 42.0, 22.0, 30.0, 28.0, 593.0, 207.0]
2025-09-13 17:44:27,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 7 minutes, 35 seconds)
2025-09-13 17:55:04,135 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:55:04,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:56:47,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 951.49249 ± 657.729
2025-09-13 17:56:47,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1132.2118, 432.91162, 2029.9369, 162.4182, 19.68859, 1038.3708, 399.52625, 1189.1195, 1186.2264, 1924.5145]
2025-09-13 17:56:47,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [400.0, 168.0, 729.0, 83.0, 21.0, 398.0, 160.0, 415.0, 421.0, 670.0]
2025-09-13 17:56:47,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 3 hours, 58 minutes, 40 seconds)
2025-09-13 18:07:02,022 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:07:02,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:08:42,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 941.57831 ± 820.416
2025-09-13 18:08:42,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [2065.4531, 1514.2794, 247.83775, 237.5078, 176.00444, 802.8548, 2578.229, 1187.2261, 344.21982, 262.17084]
2025-09-13 18:08:42,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [705.0, 533.0, 106.0, 104.0, 83.0, 296.0, 852.0, 429.0, 144.0, 116.0]
2025-09-13 18:08:42,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 3 hours, 44 minutes, 49 seconds)
2025-09-13 18:19:27,746 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:19:27,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:20:30,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 502.85333 ± 800.688
2025-09-13 18:20:30,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [379.84918, 27.385578, 14.275447, 596.51495, 283.11197, 265.56836, 100.98164, 16.223007, 513.82745, 2830.7957]
2025-09-13 18:20:30,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [192.0, 34.0, 17.0, 221.0, 189.0, 127.0, 62.0, 20.0, 244.0, 967.0]
2025-09-13 18:20:30,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 31 minutes, 16 seconds)
2025-09-13 18:30:45,196 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:30:45,198 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:32:14,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 836.23279 ± 1066.537
2025-09-13 18:32:14,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [15.66178, 195.9887, 599.2658, 2960.59, 682.0828, 142.32356, 17.208504, 2889.1382, 463.38657, 396.68222]
2025-09-13 18:32:14,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 102.0, 233.0, 1000.0, 255.0, 100.0, 26.0, 973.0, 176.0, 167.0]
2025-09-13 18:32:14,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 20 minutes, 19 seconds)
2025-09-13 18:42:55,698 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:42:55,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:44:10,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 691.51154 ± 875.333
2025-09-13 18:44:10,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1225.3462, 3031.4878, 81.45056, 201.95673, 199.14658, 84.974106, 15.531395, 988.2322, 330.59912, 756.3904]
2025-09-13 18:44:10,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [425.0, 1000.0, 54.0, 91.0, 98.0, 54.0, 31.0, 350.0, 159.0, 248.0]
2025-09-13 18:44:10,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 11 minutes, 4 seconds)
2025-09-13 18:54:28,247 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:54:28,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:54:55,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 200.92595 ± 241.452
2025-09-13 18:54:55,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [425.9749, 130.02342, 13.171917, 18.07321, 779.6149, 18.254051, 221.45384, 360.09735, 14.639467, 27.956459]
2025-09-13 18:54:55,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [163.0, 68.0, 20.0, 24.0, 292.0, 25.0, 129.0, 146.0, 16.0, 33.0]
2025-09-13 18:54:55,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 2 hours, 54 minutes, 26 seconds)
2025-09-13 19:05:13,266 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:05:13,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:05:43,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 225.67204 ± 289.963
2025-09-13 19:05:43,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [120.62402, 56.49069, 9.669993, 21.389128, 956.0578, 176.72523, 84.168526, 559.91034, 256.2666, 15.418322]
2025-09-13 19:05:43,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [95.0, 34.0, 13.0, 26.0, 314.0, 81.0, 60.0, 205.0, 120.0, 44.0]
2025-09-13 19:05:43,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 2 hours, 39 minutes, 38 seconds)
2025-09-13 19:16:33,147 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:16:33,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:17:52,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 733.01520 ± 898.470
2025-09-13 19:17:52,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [226.53568, 956.85925, 218.67183, 2930.9285, 1627.9696, 22.005516, 23.319937, 1061.3381, 12.237195, 250.28592]
2025-09-13 19:17:52,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [126.0, 315.0, 101.0, 986.0, 554.0, 24.0, 28.0, 372.0, 14.0, 107.0]
2025-09-13 19:17:52,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 29 minutes, 10 seconds)
2025-09-13 19:28:11,454 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:28:11,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:29:18,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 626.58655 ± 707.104
2025-09-13 19:29:18,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [124.41216, 261.2603, 406.06418, 1042.7451, 2202.398, 1516.6788, 16.772991, 17.496752, 662.1278, 15.909559]
2025-09-13 19:29:18,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 112.0, 159.0, 350.0, 732.0, 505.0, 26.0, 20.0, 237.0, 17.0]
2025-09-13 19:29:18,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 16 minutes, 57 seconds)
2025-09-13 19:40:08,666 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:40:08,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:41:45,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 897.09406 ± 817.779
2025-09-13 19:41:45,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1820.4054, 255.53966, 1681.6333, 48.49634, 184.178, 161.50838, 2247.9111, 440.6981, 376.4455, 1754.1251]
2025-09-13 19:41:45,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [626.0, 109.0, 572.0, 37.0, 109.0, 78.0, 736.0, 183.0, 168.0, 571.0]
2025-09-13 19:41:45,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 6 minutes, 41 seconds)
2025-09-13 19:51:45,822 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:51:45,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:52:31,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 406.68811 ± 851.199
2025-09-13 19:52:31,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [2826.5862, 15.208568, 49.306133, 14.058862, 17.677467, 943.6784, 114.2632, 47.621437, 19.10712, 19.373852]
2025-09-13 19:52:31,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [918.0, 19.0, 45.0, 22.0, 25.0, 334.0, 60.0, 36.0, 30.0, 20.0]
2025-09-13 19:52:31,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 1 hour, 55 minutes, 11 seconds)
2025-09-13 20:02:58,698 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:02:58,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:04:51,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1100.58167 ± 863.264
2025-09-13 20:04:51,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1030.1454, 212.41357, 1135.62, 951.7881, 13.580119, 1999.2496, 3022.9438, 1345.7152, 1136.6643, 157.69644]
2025-09-13 20:04:51,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [361.0, 110.0, 385.0, 337.0, 16.0, 670.0, 1000.0, 456.0, 377.0, 74.0]
2025-09-13 20:04:51,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 46 minutes, 25 seconds)
2025-09-13 20:15:18,587 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:15:18,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:16:28,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 632.52686 ± 555.839
2025-09-13 20:16:28,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [931.65393, 1727.9867, 1360.0204, 445.15927, 990.8397, 232.82132, 72.91567, 24.833292, 241.69093, 297.34705]
2025-09-13 20:16:28,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [336.0, 614.0, 481.0, 171.0, 353.0, 103.0, 43.0, 25.0, 107.0, 120.0]
2025-09-13 20:16:28,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 33 minutes, 45 seconds)
2025-09-13 20:27:20,227 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:27:20,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:28:53,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 931.04803 ± 696.043
2025-09-13 20:28:53,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [557.17206, 929.38416, 1042.003, 1567.3152, 243.09305, 92.634636, 2550.4717, 479.1302, 1301.9696, 547.307]
2025-09-13 20:28:53,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [210.0, 303.0, 332.0, 493.0, 108.0, 58.0, 812.0, 181.0, 437.0, 198.0]
2025-09-13 20:28:53,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 23 minutes, 24 seconds)
2025-09-13 20:38:42,810 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:38:42,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:39:49,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 612.34381 ± 696.763
2025-09-13 20:39:49,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1120.9819, 77.915146, 17.733686, 11.208072, 635.21246, 2403.172, 14.97658, 471.06326, 699.9903, 671.18445]
2025-09-13 20:39:49,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [410.0, 52.0, 22.0, 14.0, 254.0, 771.0, 16.0, 207.0, 253.0, 238.0]
2025-09-13 20:39:49,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 9 minutes, 41 seconds)
2025-09-13 20:50:18,782 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:50:18,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:50:56,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 299.55215 ± 273.429
2025-09-13 20:50:56,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [134.2423, 202.61345, 873.3929, 432.00574, 198.09232, 690.23486, 335.84073, 25.426233, 83.102234, 20.570536]
2025-09-13 20:50:56,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [70.0, 114.0, 323.0, 170.0, 89.0, 253.0, 140.0, 23.0, 48.0, 31.0]
2025-09-13 20:50:56,256 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 58 minutes, 24 seconds)
2025-09-13 21:01:47,586 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:01:47,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:04:49,673 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 1814.74438 ± 1187.290
2025-09-13 21:04:49,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [817.1516, 15.3014, 3037.8474, 2860.8987, 986.00995, 2654.477, 3055.8037, 2917.0747, 1755.0787, 47.79985]
2025-09-13 21:04:49,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [332.0, 17.0, 1000.0, 946.0, 361.0, 834.0, 979.0, 911.0, 603.0, 42.0]
2025-09-13 21:04:49,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1226 [INFO]: New best (1814.74) for latency ExtremeSparseL4U32
2025-09-13 21:04:49,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 47 minutes, 58 seconds)
2025-09-13 21:15:01,693 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:15:01,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:16:04,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 575.10052 ± 686.470
2025-09-13 21:16:04,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [1960.9022, 695.7885, 1176.2136, 257.5723, 1493.8383, 18.686577, 12.420198, 20.886208, 86.237404, 28.460398]
2025-09-13 21:16:04,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [652.0, 243.0, 417.0, 113.0, 520.0, 26.0, 28.0, 29.0, 62.0, 28.0]
2025-09-13 21:16:04,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 35 minutes, 45 seconds)
2025-09-13 21:27:06,536 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:27:06,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:28:16,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 652.49060 ± 904.694
2025-09-13 21:28:16,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [518.6172, 2960.2964, 425.84, 129.04434, 24.825148, 19.214737, 718.50476, 76.97627, 1639.6788, 11.908794]
2025-09-13 21:28:16,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [192.0, 962.0, 178.0, 73.0, 26.0, 19.0, 265.0, 52.0, 561.0, 26.0]
2025-09-13 21:28:16,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 23 minutes, 45 seconds)
2025-09-13 21:38:01,720 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:38:01,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:39:33,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 894.15051 ± 765.565
2025-09-13 21:39:33,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [750.1896, 1288.3755, 1714.2634, 2512.978, 464.41217, 11.947833, 115.46041, 1122.3137, 943.62726, 17.93775]
2025-09-13 21:39:33,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [266.0, 451.0, 558.0, 816.0, 175.0, 14.0, 62.0, 380.0, 340.0, 22.0]
2025-09-13 21:39:33,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 11 minutes, 56 seconds)
2025-09-13 21:49:53,028 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:49:53,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:51:08,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 725.30334 ± 747.854
2025-09-13 21:51:09,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1222 [DEBUG]: All rewards: [394.90253, 2594.0771, 454.72305, 697.40247, 14.750186, 1058.5308, 617.32666, 1329.5022, 17.675196, 74.14244]
2025-09-13 21:51:09,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [151.0, 825.0, 168.0, 244.0, 18.0, 369.0, 227.0, 449.0, 21.0, 59.0]
2025-09-13 21:51:09,011 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc15-hopper):1251 [DEBUG]: Training session finished
