2025-09-13 12:52:33,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc0-walker2d/ExtremeSparseL4U32-mbpac_memdelay
2025-09-13 12:52:33,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc0-walker2d/ExtremeSparseL4U32-mbpac_memdelay
2025-09-13 12:52:33,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x149adf7f8190>}
2025-09-13 12:52:33,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1111 [DEBUG]: using device: cuda
2025-09-13 12:52:33,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1133 [INFO]: Creating new trainer
2025-09-13 12:52:33,565 baseline-mbpac-noiseperc0-walker2d:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-13 12:52:33,565 baseline-mbpac-noiseperc0-walker2d:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-13 12:52:33,573 baseline-mbpac-noiseperc0-walker2d:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-13 12:52:34,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1194 [DEBUG]: Starting training session...
2025-09-13 12:52:34,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 1/100
2025-09-13 13:03:35,252 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:03:35,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:03:55,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: -12.02119 ± 2.508
2025-09-13 13:03:55,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [-7.075186, -13.397743, -14.290932, -12.146852, -13.05531, -9.055593, -9.734318, -15.901394, -12.945987, -12.608589]
2025-09-13 13:03:55,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [69.0, 64.0, 64.0, 69.0, 64.0, 70.0, 70.0, 61.0, 66.0, 65.0]
2025-09-13 13:03:55,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (-12.02) for latency ExtremeSparseL4U32
2025-09-13 13:03:55,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 18 hours, 42 minutes, 45 seconds)
2025-09-13 13:14:41,569 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:14:41,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:15:34,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 118.75198 ± 104.007
2025-09-13 13:15:34,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1.9213915, 23.574144, -14.465059, 8.826869, 244.45264, 85.07303, 214.01294, 227.75215, 147.56215, 248.80957]
2025-09-13 13:15:34,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [98.0, 244.0, 154.0, 114.0, 264.0, 230.0, 151.0, 159.0, 110.0, 217.0]
2025-09-13 13:15:34,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (118.75) for latency ExtremeSparseL4U32
2025-09-13 13:15:34,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 18 hours, 46 minutes, 59 seconds)
2025-09-13 13:26:19,264 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:26:19,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:27:28,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 303.11237 ± 135.459
2025-09-13 13:27:28,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [227.9912, 298.42215, 547.3249, 169.99908, 139.09288, 297.11496, 241.1875, 526.75934, 193.09576, 390.1357]
2025-09-13 13:27:28,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [149.0, 194.0, 347.0, 346.0, 138.0, 188.0, 170.0, 408.0, 130.0, 274.0]
2025-09-13 13:27:28,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (303.11) for latency ExtremeSparseL4U32
2025-09-13 13:27:28,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 18 hours, 48 minutes, 30 seconds)
2025-09-13 13:38:10,693 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:38:10,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:39:25,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 329.33734 ± 82.768
2025-09-13 13:39:25,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [324.61047, 343.57782, 276.56686, 370.52936, 411.93546, 311.7762, 452.57056, 284.70993, 137.16467, 379.9319]
2025-09-13 13:39:25,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [174.0, 193.0, 158.0, 244.0, 211.0, 188.0, 364.0, 154.0, 298.0, 555.0]
2025-09-13 13:39:25,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (329.34) for latency ExtremeSparseL4U32
2025-09-13 13:39:25,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 18 hours, 44 minutes, 29 seconds)
2025-09-13 13:50:00,671 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:50:00,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:51:11,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 365.18893 ± 133.329
2025-09-13 13:51:11,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [164.24818, 475.9149, 406.56808, 346.69983, 258.4936, 390.325, 330.5266, 655.9927, 212.7237, 410.3967]
2025-09-13 13:51:11,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [91.0, 290.0, 229.0, 224.0, 161.0, 260.0, 222.0, 473.0, 127.0, 310.0]
2025-09-13 13:51:11,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (365.19) for latency ExtremeSparseL4U32
2025-09-13 13:51:11,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 18 hours, 33 minutes, 47 seconds)
2025-09-13 14:01:56,001 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:01:56,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:02:59,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 334.00787 ± 170.905
2025-09-13 14:02:59,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [351.28973, 243.01614, 56.414104, 385.05975, 379.31573, 485.7261, 431.42905, 505.0451, 0.019252364, 502.76376]
2025-09-13 14:02:59,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [209.0, 229.0, 73.0, 253.0, 205.0, 382.0, 244.0, 245.0, 19.0, 251.0]
2025-09-13 14:02:59,276 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 18 hours, 30 minutes, 31 seconds)
2025-09-13 14:13:36,768 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:13:36,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:14:46,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 241.12798 ± 146.806
2025-09-13 14:14:46,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [218.89471, 46.889652, 350.51706, 275.89508, 383.63528, 452.30197, 56.90119, 194.33177, 40.139065, 391.77417]
2025-09-13 14:14:46,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [146.0, 65.0, 261.0, 332.0, 251.0, 462.0, 97.0, 317.0, 70.0, 336.0]
2025-09-13 14:14:46,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 18 hours, 20 minutes, 57 seconds)
2025-09-13 14:25:29,586 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:25:29,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:26:37,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 377.94220 ± 106.285
2025-09-13 14:26:37,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [346.54037, 317.14243, 276.93872, 308.93423, 271.64325, 475.20984, 557.5628, 458.7943, 513.1328, 253.52318]
2025-09-13 14:26:37,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [164.0, 202.0, 178.0, 190.0, 170.0, 299.0, 289.0, 268.0, 332.0, 151.0]
2025-09-13 14:26:37,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (377.94) for latency ExtremeSparseL4U32
2025-09-13 14:26:37,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 18 hours, 8 minutes, 17 seconds)
2025-09-13 14:37:37,549 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:37:37,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:39:07,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 390.25284 ± 166.503
2025-09-13 14:39:07,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [485.757, 513.72906, 521.2226, 235.34659, 94.06582, 237.97841, 490.8238, 257.96527, 660.3694, 405.27036]
2025-09-13 14:39:07,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [401.0, 381.0, 349.0, 175.0, 144.0, 149.0, 485.0, 215.0, 451.0, 253.0]
2025-09-13 14:39:07,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (390.25) for latency ExtremeSparseL4U32
2025-09-13 14:39:07,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 18 hours, 6 minutes, 33 seconds)
2025-09-13 14:49:44,722 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:49:44,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:50:39,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 338.44971 ± 115.107
2025-09-13 14:50:39,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [213.16899, 250.25989, 338.5928, 203.36855, 498.82275, 220.82817, 415.2397, 400.9502, 539.0652, 304.201]
2025-09-13 14:50:39,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [111.0, 141.0, 185.0, 102.0, 194.0, 125.0, 257.0, 263.0, 287.0, 152.0]
2025-09-13 14:50:39,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 17 hours, 50 minutes, 19 seconds)
2025-09-13 15:01:28,621 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:01:28,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:02:27,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 366.92676 ± 128.405
2025-09-13 15:02:27,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [288.13068, 465.7154, 548.8169, 525.41846, 177.49936, 414.69064, 423.2776, 181.90274, 247.47964, 396.33618]
2025-09-13 15:02:27,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [209.0, 195.0, 223.0, 255.0, 125.0, 273.0, 203.0, 124.0, 145.0, 185.0]
2025-09-13 15:02:27,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 17 hours, 38 minutes, 36 seconds)
2025-09-13 15:13:08,643 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:13:08,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:14:33,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 506.16888 ± 276.911
2025-09-13 15:14:33,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [391.18643, 1283.8484, 283.39624, 303.48898, 624.9291, 431.14648, 351.9509, 540.3576, 440.61697, 410.76813]
2025-09-13 15:14:33,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [257.0, 531.0, 147.0, 207.0, 380.0, 215.0, 175.0, 331.0, 379.0, 215.0]
2025-09-13 15:14:33,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (506.17) for latency ExtremeSparseL4U32
2025-09-13 15:14:33,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 17 hours, 32 minutes, 18 seconds)
2025-09-13 15:25:22,638 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:25:22,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:26:38,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 585.34332 ± 313.526
2025-09-13 15:26:38,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1244.8596, 332.4448, 825.09283, 717.1806, 825.0286, 462.53293, 305.48697, 282.88214, 671.82245, 186.10228]
2025-09-13 15:26:38,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [420.0, 143.0, 301.0, 264.0, 477.0, 176.0, 203.0, 171.0, 222.0, 104.0]
2025-09-13 15:26:38,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (585.34) for latency ExtremeSparseL4U32
2025-09-13 15:26:38,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 17 hours, 24 minutes, 10 seconds)
2025-09-13 15:37:27,377 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:37:27,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:38:09,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 273.55936 ± 190.810
2025-09-13 15:38:09,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [283.2484, 281.96072, 236.86476, 47.377975, 247.2798, 273.58105, 763.7481, 19.743444, 238.85936, 342.92987]
2025-09-13 15:38:09,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [137.0, 148.0, 135.0, 60.0, 117.0, 138.0, 312.0, 37.0, 124.0, 164.0]
2025-09-13 15:38:09,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 16 hours, 55 minutes, 11 seconds)
2025-09-13 15:48:58,426 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:48:58,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:50:13,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 523.10126 ± 413.711
2025-09-13 15:50:13,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [466.74982, 471.62323, 254.93263, 469.788, 223.26903, 278.4211, 383.5519, 1385.494, 62.70344, 1234.4794]
2025-09-13 15:50:13,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [298.0, 259.0, 140.0, 262.0, 126.0, 157.0, 141.0, 518.0, 79.0, 496.0]
2025-09-13 15:50:13,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 16 hours, 52 minutes, 45 seconds)
2025-09-13 16:00:50,551 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:00:50,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:02:20,194 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 676.46082 ± 284.712
2025-09-13 16:02:20,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [207.37659, 1107.2188, 511.35358, 797.5357, 1186.6156, 533.6699, 529.97406, 678.38983, 766.12396, 446.35022]
2025-09-13 16:02:20,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [170.0, 425.0, 207.0, 350.0, 480.0, 260.0, 249.0, 267.0, 338.0, 183.0]
2025-09-13 16:02:20,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (676.46) for latency ExtremeSparseL4U32
2025-09-13 16:02:20,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 16 hours, 45 minutes, 55 seconds)
2025-09-13 16:13:02,731 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:13:02,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:14:13,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 599.96167 ± 310.315
2025-09-13 16:14:13,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [423.1017, 275.88736, 286.35385, 293.30322, 1131.0359, 837.0178, 911.3849, 952.611, 329.74652, 559.1744]
2025-09-13 16:14:13,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [169.0, 117.0, 124.0, 126.0, 417.0, 298.0, 346.0, 345.0, 151.0, 254.0]
2025-09-13 16:14:13,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 16 hours, 30 minutes, 23 seconds)
2025-09-13 16:24:55,823 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:24:55,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:26:07,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 576.88434 ± 400.928
2025-09-13 16:26:07,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1168.3962, 1061.0292, 244.77725, 639.8851, 267.47537, 231.45471, 299.8801, 380.41748, 1250.9594, 224.56836]
2025-09-13 16:26:07,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [441.0, 392.0, 123.0, 283.0, 126.0, 114.0, 140.0, 166.0, 457.0, 123.0]
2025-09-13 16:26:07,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 16 hours, 15 minutes, 44 seconds)
2025-09-13 16:37:05,819 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:37:05,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:38:13,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 577.07214 ± 228.349
2025-09-13 16:38:13,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [932.8801, 436.8069, 978.5067, 719.0378, 297.27518, 355.92386, 369.92603, 446.16705, 575.8004, 658.39764]
2025-09-13 16:38:13,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [357.0, 182.0, 381.0, 241.0, 132.0, 157.0, 163.0, 167.0, 190.0, 272.0]
2025-09-13 16:38:13,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 16 hours, 13 minutes, 13 seconds)
2025-09-13 16:48:49,359 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:48:49,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:50:08,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 669.26654 ± 359.671
2025-09-13 16:50:08,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [330.41058, 754.82245, 328.7214, 745.7795, 1502.7405, 215.21686, 591.8318, 831.16235, 935.9792, 456.00092]
2025-09-13 16:50:08,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [176.0, 311.0, 132.0, 288.0, 553.0, 99.0, 237.0, 311.0, 355.0, 159.0]
2025-09-13 16:50:08,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 15 hours, 58 minutes, 34 seconds)
2025-09-13 17:00:50,172 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:00:50,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:03:07,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1187.70251 ± 429.651
2025-09-13 17:03:07,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1100.723, 1315.553, 1499.7958, 941.57684, 1061.4559, 279.70428, 1686.119, 1082.5953, 995.6148, 1913.8872]
2025-09-13 17:03:07,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [424.0, 521.0, 570.0, 372.0, 456.0, 132.0, 646.0, 376.0, 410.0, 732.0]
2025-09-13 17:03:07,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (1187.70) for latency ExtremeSparseL4U32
2025-09-13 17:03:07,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 16 hours, 31 seconds)
2025-09-13 17:13:55,200 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:13:55,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:15:32,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 911.07605 ± 703.545
2025-09-13 17:15:32,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [180.06145, 1209.8541, 244.95752, 850.06586, 1909.6575, 2120.2937, 230.81929, 1565.4032, 346.68237, 452.96527]
2025-09-13 17:15:32,041 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [100.0, 401.0, 121.0, 293.0, 667.0, 684.0, 113.0, 539.0, 140.0, 186.0]
2025-09-13 17:15:32,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 15 hours, 56 minutes, 30 seconds)
2025-09-13 17:26:36,559 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:26:36,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:28:35,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1186.97473 ± 294.173
2025-09-13 17:28:35,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1063.6406, 1730.3086, 1477.005, 1136.3916, 1036.0002, 1017.23425, 851.54, 1563.6345, 784.743, 1209.2489]
2025-09-13 17:28:35,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [355.0, 522.0, 492.0, 403.0, 348.0, 348.0, 313.0, 535.0, 275.0, 409.0]
2025-09-13 17:28:35,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 16 hours, 1 minute, 58 seconds)
2025-09-13 17:38:58,235 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:38:58,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:40:34,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 921.85999 ± 637.954
2025-09-13 17:40:34,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2014.254, 1560.5573, 224.39536, 1487.982, 349.72366, 254.3044, 603.15607, 202.16594, 1121.2715, 1400.7891]
2025-09-13 17:40:34,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [631.0, 487.0, 115.0, 471.0, 147.0, 166.0, 217.0, 107.0, 407.0, 480.0]
2025-09-13 17:40:34,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 15 hours, 47 minutes, 43 seconds)
2025-09-13 17:51:41,642 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:51:41,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:54:31,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1658.40234 ± 916.645
2025-09-13 17:54:31,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1850.2751, 3167.7554, 1570.5278, 1562.6974, 260.24118, 237.32664, 3039.6667, 1983.8666, 1468.8418, 1442.8256]
2025-09-13 17:54:31,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [676.0, 1000.0, 523.0, 595.0, 171.0, 146.0, 1000.0, 607.0, 497.0, 488.0]
2025-09-13 17:54:31,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (1658.40) for latency ExtremeSparseL4U32
2025-09-13 17:54:31,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 16 hours, 5 minutes, 48 seconds)
2025-09-13 18:05:43,828 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:05:43,837 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:08:27,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1651.09937 ± 771.090
2025-09-13 18:08:27,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3035.384, 2998.0444, 1641.8014, 874.90314, 1365.6873, 796.7183, 1972.0454, 847.53375, 1501.7667, 1477.1106]
2025-09-13 18:08:27,953 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [928.0, 950.0, 542.0, 328.0, 490.0, 277.0, 591.0, 351.0, 473.0, 605.0]
2025-09-13 18:08:27,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 16 hours, 6 minutes, 59 seconds)
2025-09-13 18:18:35,391 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:18:35,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:21:40,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1774.22949 ± 1128.829
2025-09-13 18:21:40,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [895.915, 3200.594, 489.9253, 3097.652, 1905.6643, 3134.3652, 2433.224, 1937.1151, 344.90683, 302.93384]
2025-09-13 18:21:40,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [313.0, 1000.0, 185.0, 1000.0, 620.0, 1000.0, 887.0, 678.0, 158.0, 179.0]
2025-09-13 18:21:40,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (1774.23) for latency ExtremeSparseL4U32
2025-09-13 18:21:40,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 16 hours, 5 minutes, 34 seconds)
2025-09-13 18:33:07,256 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:33:07,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:37:00,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2237.21021 ± 1088.851
2025-09-13 18:37:00,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2835.1067, 1823.2106, 3094.3418, 3215.248, 3314.6677, 2882.287, 179.25443, 2416.364, 2319.845, 291.77536]
2025-09-13 18:37:00,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 639.0, 1000.0, 1000.0, 1000.0, 1000.0, 88.0, 771.0, 1000.0, 173.0]
2025-09-13 18:37:00,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (2237.21) for latency ExtremeSparseL4U32
2025-09-13 18:37:00,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 16 hours, 25 minutes, 6 seconds)
2025-09-13 18:47:21,165 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:47:21,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:50:42,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2148.92627 ± 1169.542
2025-09-13 18:50:42,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3370.0032, 3265.5361, 3267.6316, 416.76053, 2687.5088, 371.07803, 1588.2588, 2056.5583, 3437.2751, 1028.6526]
2025-09-13 18:50:42,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 943.0, 1000.0, 162.0, 914.0, 162.0, 510.0, 597.0, 1000.0, 384.0]
2025-09-13 18:50:42,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 16 hours, 35 minutes, 46 seconds)
2025-09-13 19:01:18,107 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:01:18,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:04:35,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1983.81567 ± 1283.820
2025-09-13 19:04:35,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3163.5928, 3004.447, 355.10718, 3318.282, 347.55182, 861.76025, 305.45602, 3257.281, 3086.589, 2138.087]
2025-09-13 19:04:35,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 197.0, 983.0, 186.0, 283.0, 173.0, 1000.0, 1000.0, 747.0]
2025-09-13 19:04:35,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 16 hours, 20 minutes, 48 seconds)
2025-09-13 19:15:28,343 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:15:28,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:20:09,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3028.40527 ± 649.266
2025-09-13 19:20:09,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3382.6042, 3129.8357, 1188.6915, 2690.95, 3264.6646, 3390.6558, 3533.918, 3290.8254, 3264.2607, 3147.647]
2025-09-13 19:20:09,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 422.0, 825.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 19:20:09,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (3028.41) for latency ExtremeSparseL4U32
2025-09-13 19:20:09,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 16 hours, 29 minutes, 16 seconds)
2025-09-13 19:31:01,894 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:31:01,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:34:55,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2547.72021 ± 1192.222
2025-09-13 19:34:55,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3355.2622, 3438.2031, 36.942314, 2474.7393, 2662.5767, 3093.2969, 461.3312, 3339.1138, 3225.096, 3390.6409]
2025-09-13 19:34:55,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 43.0, 806.0, 822.0, 1000.0, 197.0, 1000.0, 1000.0, 1000.0]
2025-09-13 19:34:55,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 16 hours, 36 minutes, 18 seconds)
2025-09-13 19:46:10,402 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:46:10,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:50:13,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2687.17114 ± 813.424
2025-09-13 19:50:13,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3368.0452, 3460.935, 1549.2721, 3343.2234, 3015.688, 1845.3354, 3475.8618, 1911.1973, 3366.72, 1535.4315]
2025-09-13 19:50:13,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 548.0, 1000.0, 1000.0, 544.0, 1000.0, 578.0, 1000.0, 469.0]
2025-09-13 19:50:13,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 16 hours, 21 minutes, 9 seconds)
2025-09-13 20:01:05,804 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:01:05,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:04:53,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2504.78760 ± 1101.515
2025-09-13 20:04:53,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3396.4934, 3231.9292, 1055.8324, 3351.5046, 277.37448, 3327.6963, 1692.0939, 2006.0323, 3380.9307, 3327.9907]
2025-09-13 20:04:53,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 963.0, 390.0, 1000.0, 122.0, 1000.0, 596.0, 645.0, 1000.0, 955.0]
2025-09-13 20:04:53,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 16 hours, 19 minutes, 16 seconds)
2025-09-13 20:15:25,901 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:15:25,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:19:20,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2604.16089 ± 984.974
2025-09-13 20:19:20,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3368.4373, 1429.0973, 1826.489, 1294.4098, 3328.0576, 3348.9622, 1113.7806, 3446.8147, 3442.5532, 3443.0085]
2025-09-13 20:19:20,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 475.0, 585.0, 437.0, 1000.0, 1000.0, 338.0, 1000.0, 1000.0, 1000.0]
2025-09-13 20:19:20,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 16 hours, 11 minutes, 41 seconds)
2025-09-13 20:29:42,389 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:29:42,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:33:32,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2572.76587 ± 1029.720
2025-09-13 20:33:32,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3060.3296, 1327.653, 2007.7656, 817.8351, 3463.9736, 3497.3718, 3530.1309, 3346.3582, 3372.9995, 1303.2411]
2025-09-13 20:33:32,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [932.0, 404.0, 623.0, 274.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 477.0]
2025-09-13 20:33:32,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 15 hours, 39 minutes, 27 seconds)
2025-09-13 20:44:56,852 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:44:56,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:49:00,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2715.94531 ± 1039.759
2025-09-13 20:49:00,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3108.982, 3243.791, 3346.0142, 3617.866, 294.3819, 3484.3865, 3208.9944, 3413.9834, 1699.3259, 1741.7288]
2025-09-13 20:49:00,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 145.0, 1000.0, 967.0, 1000.0, 537.0, 556.0]
2025-09-13 20:49:00,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 15 hours, 33 minutes, 29 seconds)
2025-09-13 20:59:02,499 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:59:02,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:03:00,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2638.28638 ± 996.263
2025-09-13 21:03:00,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1798.7126, 3495.8206, 3561.6616, 3064.4897, 1193.0908, 3400.5278, 817.7639, 3340.9932, 3517.977, 2191.829]
2025-09-13 21:03:00,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [567.0, 962.0, 1000.0, 1000.0, 413.0, 1000.0, 288.0, 1000.0, 1000.0, 651.0]
2025-09-13 21:03:00,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 15 hours, 2 minutes, 31 seconds)
2025-09-13 21:13:33,112 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:13:33,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:18:01,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3143.48438 ± 771.252
2025-09-13 21:18:01,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3584.4504, 3590.0073, 3546.404, 3408.4998, 3302.2537, 3091.3015, 927.319, 2905.4194, 3461.8713, 3617.3179]
2025-09-13 21:18:01,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 919.0, 875.0, 340.0, 841.0, 1000.0, 1000.0]
2025-09-13 21:18:01,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (3143.48) for latency ExtremeSparseL4U32
2025-09-13 21:18:01,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 14 hours, 52 minutes, 14 seconds)
2025-09-13 21:29:38,480 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:29:38,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:34:17,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3201.44385 ± 419.773
2025-09-13 21:34:17,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2341.262, 3431.3503, 3409.4446, 3300.2146, 3452.8296, 3326.9993, 2405.9595, 3355.4106, 3432.5461, 3558.4211]
2025-09-13 21:34:17,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [691.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 659.0, 1000.0, 1000.0, 1000.0]
2025-09-13 21:34:17,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (3201.44) for latency ExtremeSparseL4U32
2025-09-13 21:34:17,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 14 hours, 59 minutes, 32 seconds)
2025-09-13 21:44:47,995 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:44:48,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:48:48,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2833.73657 ± 1101.544
2025-09-13 21:48:48,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3665.2588, 1006.535, 3567.3447, 456.06607, 3386.1067, 3428.5217, 3659.4983, 2899.9963, 2694.1868, 3573.8503]
2025-09-13 21:48:48,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 346.0, 1000.0, 168.0, 1000.0, 1000.0, 1000.0, 828.0, 748.0, 1000.0]
2025-09-13 21:48:48,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 14 hours, 48 minutes, 9 seconds)
2025-09-13 21:59:01,756 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:59:01,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:02:33,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2454.36963 ± 965.114
2025-09-13 22:02:33,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1808.621, 3403.746, 1404.5587, 1177.4829, 2377.3584, 3599.1008, 1208.4147, 2467.6433, 3570.0334, 3526.7354]
2025-09-13 22:02:33,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [512.0, 1000.0, 407.0, 356.0, 687.0, 1000.0, 367.0, 722.0, 1000.0, 1000.0]
2025-09-13 22:02:33,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 14 hours, 13 minutes, 2 seconds)
2025-09-13 22:13:14,676 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:13:14,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:17:45,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3150.92334 ± 839.659
2025-09-13 22:17:45,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3484.1934, 3540.7097, 3829.849, 926.3887, 3523.166, 3349.8862, 3593.071, 3436.4453, 3548.8284, 2276.6956]
2025-09-13 22:17:45,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 315.0, 1000.0, 920.0, 1000.0, 1000.0, 1000.0, 659.0]
2025-09-13 22:17:45,223 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 14 hours, 12 minutes, 2 seconds)
2025-09-13 22:29:03,173 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:29:03,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:33:16,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2909.67212 ± 1152.824
2025-09-13 22:33:16,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3584.0393, 3491.6218, 3355.7642, 603.915, 3425.9622, 3704.0615, 3348.3965, 3413.8843, 622.5061, 3546.5706]
2025-09-13 22:33:16,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 280.0, 1000.0, 1000.0, 1000.0, 1000.0, 269.0, 1000.0]
2025-09-13 22:33:16,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 14 hours, 2 minutes, 47 seconds)
2025-09-13 22:43:26,168 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:43:26,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:47:24,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2855.88916 ± 1184.616
2025-09-13 22:47:24,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3550.8816, 937.3613, 3657.59, 3653.5303, 3641.6467, 3690.4026, 1179.396, 3551.1313, 1031.7211, 3665.2302]
2025-09-13 22:47:24,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 322.0, 1000.0, 1000.0, 1000.0, 1000.0, 366.0, 1000.0, 320.0, 1000.0]
2025-09-13 22:47:24,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 13 hours, 24 minutes, 17 seconds)
2025-09-13 22:58:45,467 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:58:45,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:03:02,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2901.05029 ± 1040.460
2025-09-13 23:03:02,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2094.5334, 245.12787, 3511.498, 3639.0767, 3547.2551, 3109.4087, 3565.62, 3608.5742, 3474.8853, 2214.525]
2025-09-13 23:03:02,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [649.0, 131.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 702.0]
2025-09-13 23:03:02,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 13 hours, 21 minutes, 35 seconds)
2025-09-13 23:13:27,833 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:13:27,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:17:44,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3019.86182 ± 913.865
2025-09-13 23:17:44,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1402.3156, 3541.4514, 3497.0386, 3747.7012, 3515.4226, 3497.7893, 3512.0935, 3581.8757, 1117.6655, 2785.263]
2025-09-13 23:17:44,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [447.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 367.0, 821.0]
2025-09-13 23:17:44,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 13 hours, 17 minutes, 2 seconds)
2025-09-13 23:28:09,363 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:28:09,395 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:32:16,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2876.85815 ± 1034.057
2025-09-13 23:32:16,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3594.3289, 1046.2211, 3658.3958, 3370.7307, 777.0089, 2488.6804, 3450.503, 3317.8013, 3652.5813, 3412.3293]
2025-09-13 23:32:16,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 322.0, 1000.0, 1000.0, 277.0, 711.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 23:32:16,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 12 hours, 55 minutes, 5 seconds)
2025-09-13 23:43:08,746 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:43:08,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:47:48,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3264.74756 ± 699.199
2025-09-13 23:47:48,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3565.538, 2548.657, 3572.7075, 3660.7964, 3752.7356, 1401.7369, 3604.4138, 3594.6672, 3389.8958, 3556.3257]
2025-09-13 23:47:48,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 726.0, 1000.0, 1000.0, 1000.0, 444.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-13 23:47:48,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (3264.75) for latency ExtremeSparseL4U32
2025-09-13 23:47:48,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 12 hours, 40 minutes, 13 seconds)
2025-09-13 23:58:53,932 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:58:53,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:03:39,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3421.12817 ± 552.173
2025-09-14 00:03:39,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1774.093, 3613.5764, 3518.4353, 3510.0093, 3654.9404, 3587.0754, 3724.1814, 3598.99, 3637.0205, 3592.9612]
2025-09-14 00:03:39,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [521.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 00:03:39,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (3421.13) for latency ExtremeSparseL4U32
2025-09-14 00:03:39,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 12 hours, 42 minutes, 24 seconds)
2025-09-14 00:14:42,976 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:14:42,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:18:58,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3053.22437 ± 802.156
2025-09-14 00:18:58,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3615.266, 3632.6333, 2227.0945, 3618.5994, 3575.1875, 3548.9058, 3337.6047, 1602.4213, 3638.3452, 1736.1855]
2025-09-14 00:18:58,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 661.0, 1000.0, 1000.0, 1000.0, 932.0, 509.0, 1000.0, 519.0]
2025-09-14 00:18:59,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 12 hours, 24 minutes, 16 seconds)
2025-09-14 00:29:21,992 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:29:21,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:33:59,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3242.23926 ± 745.060
2025-09-14 00:33:59,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3574.1187, 3645.4797, 3650.331, 1108.0975, 3574.0999, 2909.3364, 3242.9973, 3524.9263, 3648.6062, 3544.4]
2025-09-14 00:33:59,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 399.0, 1000.0, 833.0, 927.0, 1000.0, 1000.0, 1000.0]
2025-09-14 00:33:59,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 12 hours, 11 minutes, 55 seconds)
2025-09-14 00:44:06,093 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:44:06,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:48:49,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3443.31982 ± 645.407
2025-09-14 00:48:49,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3242.2039, 3618.6328, 3684.454, 3718.3577, 1557.6632, 3673.3796, 3682.8325, 3823.6895, 3739.9778, 3692.0068]
2025-09-14 00:48:49,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [895.0, 1000.0, 954.0, 988.0, 486.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 00:48:49,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (3443.32) for latency ExtremeSparseL4U32
2025-09-14 00:48:49,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 11 hours, 59 minutes, 28 seconds)
2025-09-14 00:59:34,587 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:59:34,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:03:44,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2946.79004 ± 1060.775
2025-09-14 01:03:44,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3241.6472, 1429.3967, 3732.2603, 3674.1938, 3668.9011, 1114.4194, 3676.6482, 1500.6002, 3585.1138, 3844.721]
2025-09-14 01:03:44,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 420.0, 1000.0, 1000.0, 1000.0, 337.0, 1000.0, 435.0, 1000.0, 1000.0]
2025-09-14 01:03:44,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 11 hours, 38 minutes, 38 seconds)
2025-09-14 01:14:30,087 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:14:30,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:19:18,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3450.40967 ± 465.889
2025-09-14 01:19:18,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2375.182, 3552.6748, 3801.3047, 3629.188, 3780.3325, 3546.6882, 3645.1907, 3754.312, 3701.7937, 2717.4314]
2025-09-14 01:19:18,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [723.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 759.0]
2025-09-14 01:19:18,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (3450.41) for latency ExtremeSparseL4U32
2025-09-14 01:19:18,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 11 hours, 20 minutes, 52 seconds)
2025-09-14 01:30:42,558 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:30:42,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:34:48,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3093.15674 ± 1065.058
2025-09-14 01:34:48,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3683.8562, 2472.36, 3960.8755, 3821.0928, 3485.6943, 1954.7556, 3857.2258, 3778.6528, 3415.8928, 501.1609]
2025-09-14 01:34:48,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 671.0, 1000.0, 1000.0, 896.0, 539.0, 1000.0, 1000.0, 905.0, 185.0]
2025-09-14 01:34:48,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 11 hours, 7 minutes, 19 seconds)
2025-09-14 01:45:47,066 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:45:47,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:50:22,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3213.13940 ± 716.944
2025-09-14 01:50:22,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3528.873, 3609.669, 3692.584, 3749.291, 1729.9846, 1938.0875, 3062.7695, 3563.8525, 3786.682, 3469.6018]
2025-09-14 01:50:22,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 512.0, 605.0, 860.0, 1000.0, 1000.0, 1000.0]
2025-09-14 01:50:22,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 10 hours, 56 minutes, 51 seconds)
2025-09-14 02:00:57,658 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:00:57,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:05:08,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2988.32886 ± 926.350
2025-09-14 02:05:08,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3724.3997, 1840.5012, 3863.4062, 3530.6248, 1195.4882, 3201.0105, 3583.9075, 1815.0186, 3625.0398, 3503.8943]
2025-09-14 02:05:08,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 548.0, 1000.0, 1000.0, 370.0, 880.0, 1000.0, 545.0, 1000.0, 1000.0]
2025-09-14 02:05:08,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 10 hours, 41 minutes, 2 seconds)
2025-09-14 02:15:18,659 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:15:18,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:19:46,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3452.85400 ± 694.421
2025-09-14 02:19:46,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2107.8032, 3783.4963, 3853.329, 3917.979, 3491.1208, 2062.9258, 3713.5264, 3959.6746, 3829.19, 3809.4932]
2025-09-14 02:19:46,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [572.0, 993.0, 984.0, 1000.0, 898.0, 586.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 02:19:46,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (3452.85) for latency ExtremeSparseL4U32
2025-09-14 02:19:46,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 10 hours, 23 minutes, 24 seconds)
2025-09-14 02:30:30,847 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:30:30,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:35:04,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3271.34229 ± 933.430
2025-09-14 02:35:04,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3712.4028, 3741.3823, 3753.9192, 674.3005, 2544.9583, 3464.874, 3694.3914, 3725.3208, 3682.5088, 3719.366]
2025-09-14 02:35:04,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 272.0, 696.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 02:35:04,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 10 hours, 6 minutes, 8 seconds)
2025-09-14 02:46:15,611 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:46:15,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:50:24,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3105.25073 ± 857.213
2025-09-14 02:50:24,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3485.52, 3585.4578, 985.9839, 2645.8093, 3173.541, 3876.9468, 2332.7832, 3326.911, 3778.2803, 3861.2737]
2025-09-14 02:50:24,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [886.0, 1000.0, 322.0, 685.0, 826.0, 1000.0, 641.0, 896.0, 1000.0, 1000.0]
2025-09-14 02:50:24,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 9 hours, 49 minutes, 35 seconds)
2025-09-14 03:01:38,377 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:01:38,386 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:05:56,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3161.73389 ± 922.023
2025-09-14 03:05:56,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1066.9392, 3825.617, 3754.9207, 3821.3428, 3798.155, 3641.5874, 3592.1746, 2592.3975, 1919.1107, 3605.094]
2025-09-14 03:05:56,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [362.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 693.0, 557.0, 1000.0]
2025-09-14 03:05:56,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 9 hours, 34 minutes, 24 seconds)
2025-09-14 03:16:04,513 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:16:04,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:19:54,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2803.61865 ± 1052.966
2025-09-14 03:19:54,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1940.5419, 3819.5725, 3650.13, 2639.521, 2365.799, 1389.4225, 3752.2915, 916.25287, 3805.9478, 3756.7075]
2025-09-14 03:19:54,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [551.0, 1000.0, 1000.0, 726.0, 625.0, 444.0, 1000.0, 305.0, 1000.0, 1000.0]
2025-09-14 03:19:54,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 9 hours, 13 minutes, 21 seconds)
2025-09-14 03:31:02,534 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:31:02,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:35:30,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3206.49951 ± 585.160
2025-09-14 03:35:30,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3358.9426, 2496.5205, 3593.5725, 3643.5127, 2420.708, 3667.521, 3562.01, 3567.913, 3668.3643, 2085.9326]
2025-09-14 03:35:30,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 709.0, 1000.0, 1000.0, 697.0, 1000.0, 1000.0, 1000.0, 1000.0, 607.0]
2025-09-14 03:35:30,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 9 hours, 5 minutes, 16 seconds)
2025-09-14 03:46:45,577 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:46:45,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:51:12,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3402.98389 ± 808.214
2025-09-14 03:51:12,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3659.1206, 3850.7075, 3999.2373, 1573.4993, 2204.5522, 3098.9092, 3897.297, 3880.861, 3950.4314, 3915.2253]
2025-09-14 03:51:12,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 456.0, 599.0, 793.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 03:51:12,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 8 hours, 52 minutes, 55 seconds)
2025-09-14 04:01:03,417 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:01:03,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:05:41,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3469.77393 ± 328.854
2025-09-14 04:05:41,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3192.1633, 3629.7185, 3726.307, 3691.747, 2859.8452, 3774.548, 3714.4023, 3360.2844, 3767.287, 2981.4363]
2025-09-14 04:05:41,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [867.0, 1000.0, 1000.0, 1000.0, 752.0, 1000.0, 1000.0, 892.0, 1000.0, 802.0]
2025-09-14 04:05:41,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (3469.77) for latency ExtremeSparseL4U32
2025-09-14 04:05:41,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 8 hours, 31 minutes, 57 seconds)
2025-09-14 04:16:51,940 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:16:51,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:21:08,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3182.78003 ± 734.781
2025-09-14 04:21:08,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3825.337, 2442.3132, 3770.3647, 1598.1317, 3841.1655, 3558.1504, 2422.448, 3108.8518, 3736.855, 3524.1826]
2025-09-14 04:21:08,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 682.0, 1000.0, 447.0, 1000.0, 1000.0, 666.0, 853.0, 1000.0, 934.0]
2025-09-14 04:21:08,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 8 hours, 16 minutes, 17 seconds)
2025-09-14 04:31:49,734 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:31:49,736 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:35:46,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2929.11816 ± 999.743
2025-09-14 04:35:46,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3712.0972, 2444.4917, 3899.6362, 3782.4827, 3912.06, 1930.306, 3045.248, 1425.0146, 3790.4026, 1349.4432]
2025-09-14 04:35:46,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 652.0, 1000.0, 1000.0, 1000.0, 562.0, 814.0, 411.0, 1000.0, 421.0]
2025-09-14 04:35:46,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 8 hours, 5 minutes, 31 seconds)
2025-09-14 04:46:48,126 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:46:48,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:51:22,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3412.65283 ± 929.501
2025-09-14 04:51:22,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3832.462, 3428.9934, 3896.0757, 3801.647, 3888.7134, 854.46234, 3945.558, 2674.7742, 3898.8267, 3905.014]
2025-09-14 04:51:22,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 315.0, 1000.0, 697.0, 1000.0, 1000.0]
2025-09-14 04:51:22,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 7 hours, 50 minutes, 24 seconds)
2025-09-14 05:02:36,102 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:02:36,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:07:19,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3469.45898 ± 848.059
2025-09-14 05:07:19,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3770.1294, 3868.47, 3647.3062, 3806.6565, 3712.7805, 3780.8743, 3799.9187, 3721.2922, 932.89984, 3654.262]
2025-09-14 05:07:19,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 307.0, 1000.0]
2025-09-14 05:07:19,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 36 minutes, 42 seconds)
2025-09-14 05:18:04,802 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:18:04,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:22:26,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3284.79883 ± 794.170
2025-09-14 05:22:26,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3783.706, 3233.6453, 2098.5286, 3283.3257, 3837.4133, 3660.1033, 3805.0051, 3821.2393, 1464.7645, 3860.2585]
2025-09-14 05:22:26,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 830.0, 605.0, 891.0, 1000.0, 1000.0, 1000.0, 1000.0, 420.0, 1000.0]
2025-09-14 05:22:26,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 25 minutes, 7 seconds)
2025-09-14 05:33:04,264 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:33:04,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:38:06,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3785.57178 ± 128.415
2025-09-14 05:38:06,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3989.3342, 3766.6582, 3828.7566, 3859.711, 3856.9119, 3535.9385, 3744.2476, 3596.7559, 3787.4504, 3889.9585]
2025-09-14 05:38:06,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 939.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 05:38:06,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (3785.57) for latency ExtremeSparseL4U32
2025-09-14 05:38:06,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 7 hours, 10 minutes, 57 seconds)
2025-09-14 05:48:10,741 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:48:10,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:53:12,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3661.17456 ± 238.618
2025-09-14 05:53:12,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3804.109, 3661.2646, 2961.5557, 3768.895, 3779.4888, 3735.7725, 3760.8352, 3688.682, 3793.9014, 3657.2434]
2025-09-14 05:53:12,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 850.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 05:53:12,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 58 minutes, 8 seconds)
2025-09-14 06:04:02,834 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:04:02,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:09:01,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3758.88940 ± 79.943
2025-09-14 06:09:01,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3825.384, 3806.1492, 3898.2764, 3774.0017, 3580.6284, 3731.2122, 3745.8462, 3693.8293, 3782.995, 3750.5698]
2025-09-14 06:09:01,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 943.0, 1000.0, 1000.0]
2025-09-14 06:09:01,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 43 minutes, 44 seconds)
2025-09-14 06:20:08,329 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:20:08,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:24:38,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3360.16724 ± 864.199
2025-09-14 06:24:38,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1162.3606, 3652.2449, 3665.4387, 3757.9077, 3765.1887, 3802.627, 3891.3003, 3935.2688, 2261.2432, 3708.0947]
2025-09-14 06:24:38,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [362.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 613.0, 1000.0]
2025-09-14 06:24:38,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 26 minutes, 33 seconds)
2025-09-14 06:35:32,984 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:35:32,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:40:33,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3843.60083 ± 136.884
2025-09-14 06:40:33,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3859.1792, 3867.8008, 3759.0752, 3949.8542, 4022.3362, 3824.6335, 3737.31, 3681.9, 3646.431, 4087.4902]
2025-09-14 06:40:33,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 990.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 06:40:33,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (3843.60) for latency ExtremeSparseL4U32
2025-09-14 06:40:33,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 14 minutes, 55 seconds)
2025-09-14 06:50:57,770 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:50:57,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:54:56,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3006.10986 ± 1319.398
2025-09-14 06:54:56,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3845.6929, 3928.7085, 858.3753, 1036.0482, 1089.0948, 3913.691, 3825.0452, 3929.8872, 3718.6265, 3915.9297]
2025-09-14 06:54:56,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 275.0, 337.0, 346.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 06:54:56,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 53 minutes, 26 seconds)
2025-09-14 07:05:41,404 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:05:41,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:10:23,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3492.90112 ± 652.923
2025-09-14 07:10:23,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3821.8691, 3785.2976, 3949.6785, 1941.7574, 3835.9304, 3815.334, 2488.2783, 3814.3206, 3743.6453, 3732.8987]
2025-09-14 07:10:23,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 985.0, 559.0, 1000.0, 1000.0, 670.0, 1000.0, 1000.0, 1000.0]
2025-09-14 07:10:23,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 39 minutes, 34 seconds)
2025-09-14 07:21:03,805 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:21:03,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:25:21,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3276.74536 ± 894.139
2025-09-14 07:25:21,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [930.0327, 2822.445, 3872.6345, 2887.1614, 3862.0342, 3740.8103, 3782.0925, 2991.0317, 3960.0085, 3919.2039]
2025-09-14 07:25:21,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [278.0, 751.0, 1000.0, 781.0, 1000.0, 1000.0, 1000.0, 781.0, 1000.0, 1000.0]
2025-09-14 07:25:21,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 20 minutes, 39 seconds)
2025-09-14 07:36:36,073 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:36:36,084 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:40:49,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3048.17456 ± 1027.631
2025-09-14 07:40:49,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3415.8066, 3598.2468, 3556.5796, 1668.0857, 2441.8508, 718.1707, 3800.69, 3584.1807, 3734.1414, 3963.9941]
2025-09-14 07:40:49,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [897.0, 1000.0, 1000.0, 475.0, 702.0, 251.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 07:40:49,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 4 minutes, 47 seconds)
2025-09-14 07:51:11,223 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:51:11,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:56:07,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3758.18237 ± 187.138
2025-09-14 07:56:07,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3905.4077, 3895.0334, 3877.9878, 3751.3516, 3747.9644, 3812.1206, 3224.9524, 3739.0547, 3841.0278, 3786.9226]
2025-09-14 07:56:07,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 836.0, 1000.0, 1000.0, 1000.0]
2025-09-14 07:56:07,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 47 minutes, 10 seconds)
2025-09-14 08:06:47,924 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:06:47,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:11:41,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3695.08740 ± 251.606
2025-09-14 08:11:41,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3877.6973, 3894.2573, 3659.74, 3695.1624, 3716.7954, 3775.6685, 3769.1052, 2970.988, 3765.9927, 3825.4673]
2025-09-14 08:11:41,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 785.0, 1000.0, 1000.0]
2025-09-14 08:11:41,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 36 minutes, 20 seconds)
2025-09-14 08:22:56,965 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:22:56,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:27:55,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3825.54761 ± 145.913
2025-09-14 08:27:55,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4038.4136, 3810.3772, 3943.2842, 3949.5337, 3847.5945, 3464.8662, 3770.0942, 3794.455, 3777.15, 3859.7083]
2025-09-14 08:27:55,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 907.0, 1000.0, 982.0, 1000.0, 1000.0]
2025-09-14 08:27:55,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 23 minutes, 37 seconds)
2025-09-14 08:38:18,205 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:38:18,214 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:42:12,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2966.68213 ± 1189.177
2025-09-14 08:42:12,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2038.4636, 3831.3113, 3913.0735, 3897.0808, 3923.162, 3910.1458, 4025.3665, 982.6411, 1679.675, 1465.9009]
2025-09-14 08:42:12,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [549.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 316.0, 479.0, 449.0]
2025-09-14 08:42:12,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 5 minutes, 54 seconds)
2025-09-14 08:53:32,734 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:53:32,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:58:39,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3852.93750 ± 102.244
2025-09-14 08:58:39,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3774.3657, 3896.0173, 3880.404, 3977.7092, 3587.2173, 3834.841, 3897.834, 3894.655, 3859.5503, 3926.7822]
2025-09-14 08:58:39,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 08:58:39,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (3852.94) for latency ExtremeSparseL4U32
2025-09-14 08:58:39,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 53 minutes, 28 seconds)
2025-09-14 09:08:59,908 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:08:59,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:13:18,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3278.43433 ± 924.300
2025-09-14 09:13:18,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2705.8557, 3872.619, 3769.8093, 3835.9941, 1581.9607, 3896.9414, 3880.1848, 3829.372, 3873.7195, 1537.8878]
2025-09-14 09:13:18,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [719.0, 1000.0, 1000.0, 1000.0, 463.0, 1000.0, 1000.0, 1000.0, 1000.0, 467.0]
2025-09-14 09:13:18,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 36 minutes, 7 seconds)
2025-09-14 09:25:02,417 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:25:02,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:29:18,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3317.85352 ± 1005.079
2025-09-14 09:29:18,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4014.4954, 3990.632, 3831.8977, 4047.752, 4050.3674, 1946.1506, 3933.4204, 2626.7673, 3651.0596, 1085.9918]
2025-09-14 09:29:18,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 511.0, 1000.0, 709.0, 1000.0, 321.0]
2025-09-14 09:29:18,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 21 minutes, 47 seconds)
2025-09-14 09:39:14,024 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:39:14,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:43:30,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3308.09106 ± 939.290
2025-09-14 09:43:30,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4019.4465, 3887.9844, 905.134, 3014.984, 3034.209, 3973.1204, 3941.9385, 3869.5325, 2557.9302, 3876.632]
2025-09-14 09:43:30,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 281.0, 753.0, 791.0, 1000.0, 1000.0, 1000.0, 670.0, 1000.0]
2025-09-14 09:43:30,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 1 minute, 25 seconds)
2025-09-14 09:54:46,469 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:54:46,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:59:39,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3685.80811 ± 256.122
2025-09-14 09:59:39,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3839.5715, 3761.6294, 3886.2156, 3697.3367, 3921.2485, 3901.8076, 3576.6094, 3035.8384, 3761.8665, 3475.956]
2025-09-14 09:59:39,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 874.0, 1000.0, 902.0]
2025-09-14 09:59:39,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 50 minutes, 22 seconds)
2025-09-14 10:10:24,497 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:10:24,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:14:54,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3382.42920 ± 905.823
2025-09-14 10:14:54,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3873.9846, 3806.272, 3363.2449, 3712.6626, 3920.0964, 2383.7078, 1029.4302, 3828.3337, 3972.5586, 3934.0027]
2025-09-14 10:14:54,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 869.0, 1000.0, 1000.0, 686.0, 330.0, 1000.0, 1000.0, 1000.0]
2025-09-14 10:14:54,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 32 minutes, 29 seconds)
2025-09-14 10:25:31,693 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:25:31,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:30:10,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3529.47192 ± 891.776
2025-09-14 10:30:10,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3877.8254, 3786.075, 3909.4053, 4038.0652, 3887.0803, 3952.916, 3008.4226, 985.58246, 3923.1836, 3926.1633]
2025-09-14 10:30:10,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 783.0, 332.0, 1000.0, 1000.0]
2025-09-14 10:30:10,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 18 minutes, 21 seconds)
2025-09-14 10:41:45,765 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:41:45,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:46:00,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3207.48633 ± 1060.912
2025-09-14 10:46:00,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3934.3284, 3967.5513, 3073.8718, 3954.7817, 2815.049, 3896.1675, 1790.2306, 781.8235, 3872.5605, 3988.4983]
2025-09-14 10:46:00,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 810.0, 1000.0, 736.0, 1000.0, 503.0, 285.0, 1000.0, 1000.0]
2025-09-14 10:46:00,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 2 minutes, 43 seconds)
2025-09-14 10:56:35,090 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:56:35,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:01:35,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3945.32568 ± 148.278
2025-09-14 11:01:35,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [4033.7173, 3965.733, 3531.954, 3918.7947, 4006.3762, 3983.4539, 3885.2747, 4044.477, 4079.4636, 4004.0134]
2025-09-14 11:01:35,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 900.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 11:01:35,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1226 [INFO]: New best (3945.33) for latency ExtremeSparseL4U32
2025-09-14 11:01:35,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 49 minutes, 18 seconds)
2025-09-14 11:11:44,328 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:11:44,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:16:27,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3720.26099 ± 368.165
2025-09-14 11:16:27,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3930.174, 4065.153, 3895.6345, 3529.4246, 4025.128, 3002.1362, 3882.0815, 3992.6785, 3080.0693, 3800.1284]
2025-09-14 11:16:27,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 867.0, 1000.0, 807.0, 1000.0, 1000.0, 780.0, 1000.0]
2025-09-14 11:16:27,336 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 32 minutes, 9 seconds)
2025-09-14 11:27:18,524 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:27:18,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:31:27,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3095.61670 ± 1122.635
2025-09-14 11:31:27,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3827.924, 1651.131, 3762.5547, 1590.418, 964.21826, 3734.586, 3865.9934, 3885.3782, 3870.5166, 3803.4495]
2025-09-14 11:31:27,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 467.0, 1000.0, 502.0, 305.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 11:31:27,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 16 minutes, 33 seconds)
2025-09-14 11:43:00,683 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:43:00,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:47:22,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3227.75732 ± 926.571
2025-09-14 11:47:22,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2155.0854, 2703.801, 3940.5735, 3928.047, 3986.354, 3373.049, 3780.505, 3533.6062, 1038.3726, 3838.177]
2025-09-14 11:47:22,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [595.0, 739.0, 1000.0, 1000.0, 1000.0, 885.0, 1000.0, 1000.0, 320.0, 1000.0]
2025-09-14 11:47:22,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 1 minute, 45 seconds)
2025-09-14 11:58:28,453 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:58:28,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:03:23,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3704.85107 ± 354.067
2025-09-14 12:03:23,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3766.3804, 3814.1194, 3872.0679, 3832.624, 3724.672, 3829.6438, 3863.4932, 3825.2927, 2650.8508, 3869.369]
2025-09-14 12:03:23,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 715.0, 1000.0]
2025-09-14 12:03:23,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 46 minutes, 25 seconds)
2025-09-14 12:14:10,847 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 12:14:10,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:18:05,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 2882.68994 ± 1262.974
2025-09-14 12:18:05,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3179.589, 3811.9158, 1994.1826, 831.87964, 3829.7021, 3863.982, 3789.956, 375.68762, 3771.9714, 3378.033]
2025-09-14 12:18:05,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [805.0, 1000.0, 556.0, 276.0, 1000.0, 1000.0, 1000.0, 162.0, 1000.0, 866.0]
2025-09-14 12:18:05,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 30 minutes, 35 seconds)
2025-09-14 12:28:39,432 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 12:28:39,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:32:59,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3283.41675 ± 1034.149
2025-09-14 12:32:59,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3721.0217, 456.7926, 3924.4473, 3844.2332, 3035.3372, 2558.8948, 3822.4004, 3802.3901, 3702.8293, 3965.8208]
2025-09-14 12:32:59,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 191.0, 1000.0, 1000.0, 811.0, 682.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-14 12:32:59,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 15 minutes, 18 seconds)
2025-09-14 12:43:50,559 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 12:43:50,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:48:17,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 3345.76099 ± 849.772
2025-09-14 12:48:17,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [3953.461, 4021.2727, 3912.9814, 3559.6292, 2474.1013, 3974.9646, 3787.7485, 1392.1836, 3827.9717, 2553.2986]
2025-09-14 12:48:17,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 649.0, 1000.0, 1000.0, 409.0, 1000.0, 684.0]
2025-09-14 12:48:17,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-walker2d):1251 [DEBUG]: Training session finished
