2025-09-13 18:23:07,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc10-walker2d/ExtremeSparseL4U32-mbpac_memdelay
2025-09-13 18:23:07,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc10-walker2d/ExtremeSparseL4U32-mbpac_memdelay
2025-09-13 18:23:07,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x153e9bb60190>}
2025-09-13 18:23:07,102 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1111 [DEBUG]: using device: cuda
2025-09-13 18:23:07,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1133 [INFO]: Creating new trainer
2025-09-13 18:23:07,112 baseline-mbpac-noiseperc10-walker2d:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-13 18:23:07,113 baseline-mbpac-noiseperc10-walker2d:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-13 18:23:07,120 baseline-mbpac-noiseperc10-walker2d:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-13 18:23:08,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1194 [DEBUG]: Starting training session...
2025-09-13 18:23:08,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 1/100
2025-09-13 18:34:03,348 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:34:03,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:35:04,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 245.27603 ± 108.498
2025-09-13 18:35:04,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [332.5959, 346.0889, 300.94992, 283.3852, 396.98755, 36.98451, 177.8713, 237.5243, 251.10634, 89.26611]
2025-09-13 18:35:04,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [197.0, 262.0, 180.0, 160.0, 286.0, 140.0, 331.0, 126.0, 150.0, 218.0]
2025-09-13 18:35:04,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (245.28) for latency ExtremeSparseL4U32
2025-09-13 18:35:04,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 19 hours, 41 minutes, 50 seconds)
2025-09-13 18:45:37,959 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:45:37,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:46:05,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 56.01034 ± 66.334
2025-09-13 18:46:05,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [28.822664, 80.3674, 76.44495, 240.36185, 20.155039, 11.444314, 6.551403, 11.411939, 32.744556, 51.799313]
2025-09-13 18:46:05,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [81.0, 128.0, 71.0, 242.0, 30.0, 24.0, 20.0, 37.0, 72.0, 225.0]
2025-09-13 18:46:05,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 18 hours, 44 minutes, 48 seconds)
2025-09-13 18:56:58,710 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:56:58,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:57:49,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 222.95769 ± 116.935
2025-09-13 18:57:49,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [119.8288, 248.02025, 418.67154, 320.5027, 294.35083, 310.25412, 231.2033, 49.43102, 38.0329, 199.28159]
2025-09-13 18:57:49,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [170.0, 170.0, 351.0, 203.0, 166.0, 202.0, 137.0, 155.0, 49.0, 115.0]
2025-09-13 18:57:49,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 18 hours, 41 minutes, 23 seconds)
2025-09-13 19:08:43,907 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:08:43,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:09:28,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 195.11395 ± 106.380
2025-09-13 19:09:28,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [199.34563, 252.43898, 133.72534, 64.608154, 239.11902, 264.7341, 73.30748, 291.1167, 385.6689, 47.07508]
2025-09-13 19:09:28,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [107.0, 164.0, 72.0, 93.0, 142.0, 154.0, 120.0, 400.0, 180.0, 86.0]
2025-09-13 19:09:28,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 18 hours, 32 minutes)
2025-09-13 19:20:23,879 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:20:23,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:21:17,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 320.85129 ± 205.519
2025-09-13 19:21:17,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [320.21426, 481.18045, 457.3852, 298.8361, 20.983953, 68.892586, 42.539234, 357.02335, 652.50995, 508.9478]
2025-09-13 19:21:17,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [175.0, 230.0, 247.0, 151.0, 35.0, 86.0, 51.0, 176.0, 371.0, 278.0]
2025-09-13 19:21:17,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (320.85) for latency ExtremeSparseL4U32
2025-09-13 19:21:17,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 18 hours, 25 minutes, 1 second)
2025-09-13 19:32:10,278 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:32:10,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:33:16,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 311.47452 ± 206.530
2025-09-13 19:33:16,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [98.6184, 250.87349, 512.3509, 90.48385, 71.66677, 782.17566, 276.47083, 401.13586, 318.0595, 312.90994]
2025-09-13 19:33:16,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [113.0, 161.0, 354.0, 101.0, 133.0, 616.0, 145.0, 266.0, 166.0, 164.0]
2025-09-13 19:33:16,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 18 hours, 14 minutes, 1 second)
2025-09-13 19:44:18,926 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:44:18,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:45:05,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 184.40154 ± 117.947
2025-09-13 19:45:05,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [37.912514, 107.78904, 137.79158, 363.1062, 231.9983, 76.53013, 41.310127, 378.18045, 253.29378, 216.10321]
2025-09-13 19:45:05,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [59.0, 123.0, 167.0, 224.0, 296.0, 91.0, 57.0, 228.0, 168.0, 128.0]
2025-09-13 19:45:05,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 18 hours, 17 minutes, 14 seconds)
2025-09-13 19:56:03,360 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:56:03,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:57:15,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 390.50928 ± 37.484
2025-09-13 19:57:15,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [365.86572, 402.8324, 435.41605, 453.46048, 414.92017, 382.97775, 379.6649, 352.18106, 319.69028, 398.0839]
2025-09-13 19:57:15,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [232.0, 251.0, 261.0, 264.0, 283.0, 248.0, 235.0, 212.0, 171.0, 227.0]
2025-09-13 19:57:15,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (390.51) for latency ExtremeSparseL4U32
2025-09-13 19:57:15,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 18 hours, 13 minutes, 44 seconds)
2025-09-13 20:08:14,396 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:08:14,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:09:07,071 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 276.64188 ± 61.507
2025-09-13 20:09:07,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [202.40324, 324.1554, 253.83836, 275.81833, 332.48163, 209.33061, 417.10385, 258.00754, 249.44702, 243.83269]
2025-09-13 20:09:07,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [123.0, 246.0, 135.0, 234.0, 208.0, 112.0, 261.0, 171.0, 154.0, 143.0]
2025-09-13 20:09:07,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 18 hours, 5 minutes, 33 seconds)
2025-09-13 20:20:21,139 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:20:21,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:20:57,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 208.64468 ± 100.891
2025-09-13 20:20:57,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [250.18832, 253.95161, 227.74435, 39.60469, 184.95203, 351.33536, 257.76126, 302.44324, 201.8983, 16.567625]
2025-09-13 20:20:57,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [143.0, 154.0, 104.0, 73.0, 123.0, 215.0, 119.0, 175.0, 112.0, 24.0]
2025-09-13 20:20:57,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 17 hours, 53 minutes, 57 seconds)
2025-09-13 20:31:50,067 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:31:50,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:32:33,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 230.42111 ± 92.674
2025-09-13 20:32:33,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [266.6758, 394.747, 182.50516, 214.61407, 117.149376, 173.40874, 265.1412, 116.417175, 382.65573, 190.89691]
2025-09-13 20:32:33,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [141.0, 234.0, 138.0, 137.0, 133.0, 101.0, 163.0, 72.0, 196.0, 135.0]
2025-09-13 20:32:33,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 17 hours, 35 minutes, 13 seconds)
2025-09-13 20:43:33,292 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:43:33,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:44:22,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 306.00644 ± 125.079
2025-09-13 20:44:22,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [294.3016, 213.91759, 302.07907, 33.504932, 410.26086, 341.85922, 516.46423, 241.1169, 288.35236, 418.2077]
2025-09-13 20:44:22,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [139.0, 123.0, 141.0, 43.0, 245.0, 173.0, 274.0, 136.0, 149.0, 253.0]
2025-09-13 20:44:22,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 17 hours, 23 minutes, 36 seconds)
2025-09-13 20:55:32,974 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:55:32,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:56:26,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 344.50043 ± 201.448
2025-09-13 20:56:26,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [364.10147, 423.37634, 275.29922, 238.24179, 869.5729, 308.95465, 306.05295, 23.524984, 329.11017, 306.7699]
2025-09-13 20:56:26,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [187.0, 190.0, 163.0, 134.0, 398.0, 153.0, 164.0, 33.0, 189.0, 173.0]
2025-09-13 20:56:26,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 17 hours, 9 minutes, 40 seconds)
2025-09-13 21:07:21,068 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:07:21,070 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:08:05,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 284.99042 ± 70.921
2025-09-13 21:08:05,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [222.21532, 273.5309, 174.17029, 436.02917, 249.40318, 360.52267, 268.34805, 289.54205, 331.9094, 244.23323]
2025-09-13 21:08:05,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [125.0, 143.0, 110.0, 188.0, 135.0, 181.0, 151.0, 173.0, 163.0, 128.0]
2025-09-13 21:08:05,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 16 hours, 54 minutes, 17 seconds)
2025-09-13 21:19:10,396 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:19:10,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:19:54,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 307.13538 ± 122.038
2025-09-13 21:19:54,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [342.72076, 426.69397, 396.2718, 256.5501, 471.54477, 216.98228, 247.6112, 407.24808, 37.658066, 268.07306]
2025-09-13 21:19:54,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [150.0, 171.0, 181.0, 148.0, 234.0, 116.0, 138.0, 172.0, 49.0, 136.0]
2025-09-13 21:19:54,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 16 hours, 42 minutes, 3 seconds)
2025-09-13 21:31:00,747 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:31:00,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:31:38,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 279.82584 ± 170.892
2025-09-13 21:31:38,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [369.16022, 405.66415, 393.85916, 432.2678, 388.46503, 13.597968, 9.525565, 66.79953, 264.12405, 454.79492]
2025-09-13 21:31:38,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [161.0, 175.0, 157.0, 173.0, 174.0, 25.0, 21.0, 66.0, 127.0, 200.0]
2025-09-13 21:31:38,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 16 hours, 32 minutes, 44 seconds)
2025-09-13 21:42:36,065 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:42:36,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:43:17,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 304.34949 ± 172.766
2025-09-13 21:43:17,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [582.13617, 352.46057, 13.891433, 451.89368, 198.7572, 15.968796, 418.57123, 297.6774, 348.55838, 363.58]
2025-09-13 21:43:17,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [210.0, 152.0, 32.0, 203.0, 127.0, 23.0, 169.0, 142.0, 162.0, 163.0]
2025-09-13 21:43:17,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 16 hours, 17 minutes, 48 seconds)
2025-09-13 21:54:13,123 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:54:13,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:55:05,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 378.82483 ± 78.511
2025-09-13 21:55:05,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [532.40643, 391.9928, 275.07162, 311.43732, 372.45123, 429.77222, 394.52472, 393.55853, 435.6597, 251.37383]
2025-09-13 21:55:05,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [227.0, 181.0, 149.0, 153.0, 184.0, 189.0, 168.0, 189.0, 173.0, 164.0]
2025-09-13 21:55:05,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 16 hours, 1 minute, 56 seconds)
2025-09-13 22:06:20,461 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:06:20,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:07:05,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 344.27750 ± 115.160
2025-09-13 22:07:05,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [517.9085, 360.4597, 535.2647, 233.84506, 197.13084, 371.86227, 351.34848, 317.62186, 382.96133, 174.37209]
2025-09-13 22:07:05,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [216.0, 162.0, 210.0, 102.0, 93.0, 168.0, 148.0, 146.0, 175.0, 98.0]
2025-09-13 22:07:05,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 15 hours, 55 minutes, 56 seconds)
2025-09-13 22:17:59,731 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:17:59,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:19:11,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 568.08557 ± 159.241
2025-09-13 22:19:11,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [315.01144, 383.6182, 650.5006, 564.16925, 678.1793, 513.94867, 712.24524, 658.3889, 376.17584, 828.61835]
2025-09-13 22:19:11,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [143.0, 186.0, 261.0, 215.0, 359.0, 196.0, 302.0, 300.0, 171.0, 302.0]
2025-09-13 22:19:11,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (568.09) for latency ExtremeSparseL4U32
2025-09-13 22:19:11,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 15 hours, 48 minutes, 32 seconds)
2025-09-13 22:30:12,077 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:30:12,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:31:21,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 584.75793 ± 115.626
2025-09-13 22:31:21,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [609.3186, 615.9107, 459.03842, 551.5385, 536.7896, 505.7704, 434.86948, 648.18164, 620.5067, 865.65466]
2025-09-13 22:31:21,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [233.0, 220.0, 187.0, 226.0, 295.0, 188.0, 173.0, 240.0, 226.0, 321.0]
2025-09-13 22:31:21,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (584.76) for latency ExtremeSparseL4U32
2025-09-13 22:31:21,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 15 hours, 43 minutes, 26 seconds)
2025-09-13 22:42:31,212 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:42:31,219 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:43:28,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 460.45038 ± 93.396
2025-09-13 22:43:28,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [403.46722, 456.12296, 615.7903, 363.90616, 366.08502, 338.6353, 493.17276, 617.12164, 456.11063, 494.09198]
2025-09-13 22:43:28,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [159.0, 183.0, 213.0, 171.0, 167.0, 171.0, 216.0, 249.0, 189.0, 219.0]
2025-09-13 22:43:28,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 15 hours, 38 minutes, 54 seconds)
2025-09-13 22:54:35,256 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:54:35,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:55:37,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 528.14844 ± 179.096
2025-09-13 22:55:37,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [649.60815, 509.27176, 565.4933, 550.9205, 7.336079, 603.26984, 647.5686, 605.37354, 533.31415, 609.3284]
2025-09-13 22:55:37,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [258.0, 198.0, 219.0, 222.0, 28.0, 266.0, 233.0, 226.0, 199.0, 244.0]
2025-09-13 22:55:37,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 15 hours, 32 minutes, 16 seconds)
2025-09-13 23:06:25,833 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:06:25,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:07:25,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 512.25549 ± 74.507
2025-09-13 23:07:25,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [601.96844, 610.3608, 395.9772, 557.94684, 457.60965, 543.4347, 397.19412, 573.87225, 506.65833, 477.5323]
2025-09-13 23:07:25,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [219.0, 262.0, 164.0, 200.0, 170.0, 197.0, 154.0, 231.0, 209.0, 183.0]
2025-09-13 23:07:25,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 15 hours, 16 minutes, 59 seconds)
2025-09-13 23:18:45,871 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:18:45,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:19:56,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 600.31409 ± 103.957
2025-09-13 23:19:56,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [670.7975, 601.83984, 586.4155, 850.0248, 432.4117, 573.07935, 535.1958, 603.3592, 630.1823, 519.83514]
2025-09-13 23:19:56,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [233.0, 209.0, 232.0, 359.0, 199.0, 203.0, 221.0, 226.0, 222.0, 226.0]
2025-09-13 23:19:56,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (600.31) for latency ExtremeSparseL4U32
2025-09-13 23:19:56,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 15 hours, 11 minutes, 18 seconds)
2025-09-13 23:30:49,008 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:30:49,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:31:55,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 576.98309 ± 34.525
2025-09-13 23:31:55,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [652.3249, 612.5503, 532.1829, 536.19244, 553.7144, 580.4899, 574.652, 558.5117, 596.66614, 572.54626]
2025-09-13 23:31:55,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [251.0, 264.0, 204.0, 211.0, 226.0, 223.0, 206.0, 212.0, 213.0, 244.0]
2025-09-13 23:31:55,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 14 hours, 56 minutes, 26 seconds)
2025-09-13 23:42:57,207 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:42:57,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:44:07,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 593.94806 ± 135.284
2025-09-13 23:44:07,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [285.65918, 668.9858, 656.4939, 511.74127, 681.1117, 422.23312, 666.6819, 738.60443, 629.82526, 678.1436]
2025-09-13 23:44:07,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [187.0, 239.0, 242.0, 210.0, 263.0, 179.0, 279.0, 279.0, 231.0, 264.0]
2025-09-13 23:44:07,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 14 hours, 45 minutes, 38 seconds)
2025-09-13 23:55:17,419 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:55:17,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:56:28,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 600.15997 ± 204.625
2025-09-13 23:56:28,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [612.26654, 654.2366, 724.7659, 630.4633, 632.1476, 663.9042, 613.733, 830.01514, 623.47534, 16.591772]
2025-09-13 23:56:28,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [248.0, 242.0, 277.0, 256.0, 256.0, 272.0, 225.0, 296.0, 266.0, 29.0]
2025-09-13 23:56:28,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 14 hours, 36 minutes, 4 seconds)
2025-09-14 00:07:28,562 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:07:28,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:08:47,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 687.53876 ± 99.847
2025-09-14 00:08:47,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [641.49115, 644.72003, 837.5293, 566.2488, 876.64355, 597.6902, 690.941, 759.19446, 591.4961, 669.4329]
2025-09-14 00:08:47,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [282.0, 257.0, 298.0, 207.0, 327.0, 240.0, 274.0, 297.0, 235.0, 246.0]
2025-09-14 00:08:47,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (687.54) for latency ExtremeSparseL4U32
2025-09-14 00:08:47,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 14 hours, 31 minutes, 20 seconds)
2025-09-14 00:19:44,745 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:19:44,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:21:02,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 698.09930 ± 91.266
2025-09-14 00:21:02,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [824.0985, 554.69604, 686.8265, 811.87573, 693.4794, 824.5217, 713.07263, 643.10425, 632.06274, 597.2558]
2025-09-14 00:21:02,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [294.0, 221.0, 233.0, 281.0, 262.0, 310.0, 280.0, 230.0, 248.0, 243.0]
2025-09-14 00:21:02,554 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (698.10) for latency ExtremeSparseL4U32
2025-09-14 00:21:02,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 14 hours, 15 minutes, 23 seconds)
2025-09-14 00:32:09,956 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:32:09,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:33:36,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 833.56122 ± 333.765
2025-09-14 00:33:36,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [717.1654, 870.9196, 539.8135, 837.72845, 1162.5535, 1176.6503, 1066.3906, 7.154518, 917.1453, 1040.0908]
2025-09-14 00:33:36,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [263.0, 314.0, 222.0, 300.0, 362.0, 417.0, 348.0, 19.0, 362.0, 325.0]
2025-09-14 00:33:36,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (833.56) for latency ExtremeSparseL4U32
2025-09-14 00:33:36,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 14 hours, 11 minutes, 16 seconds)
2025-09-14 00:44:52,397 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:44:52,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:45:51,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 494.94214 ± 326.637
2025-09-14 00:45:51,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [11.09396, 12.962323, 11.273767, 719.73193, 664.1388, 575.5904, 590.4845, 691.87683, 826.9849, 845.2839]
2025-09-14 00:45:51,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 25.0, 20.0, 321.0, 241.0, 227.0, 224.0, 254.0, 333.0, 292.0]
2025-09-14 00:45:51,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 13 hours, 59 minutes, 23 seconds)
2025-09-14 00:56:45,052 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:56:45,054 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:58:12,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 802.91510 ± 133.239
2025-09-14 00:58:12,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [826.3744, 715.63165, 1039.7247, 795.4719, 788.0614, 774.73926, 669.1135, 767.6319, 611.7532, 1040.6488]
2025-09-14 00:58:12,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [302.0, 261.0, 343.0, 285.0, 292.0, 250.0, 253.0, 292.0, 258.0, 372.0]
2025-09-14 00:58:12,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 13 hours, 47 minutes, 13 seconds)
2025-09-14 01:09:25,365 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:09:25,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:11:05,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 887.60138 ± 193.388
2025-09-14 01:11:05,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [815.0847, 858.1159, 864.8347, 699.8491, 1243.0707, 812.693, 863.13306, 1267.5743, 688.3914, 763.26636]
2025-09-14 01:11:05,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [317.0, 296.0, 347.0, 270.0, 402.0, 316.0, 334.0, 488.0, 271.0, 267.0]
2025-09-14 01:11:05,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (887.60) for latency ExtremeSparseL4U32
2025-09-14 01:11:05,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 13 hours, 42 minutes, 27 seconds)
2025-09-14 01:22:10,663 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:22:10,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:23:29,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 686.37549 ± 231.557
2025-09-14 01:23:29,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [653.85315, 691.43475, 89.35157, 577.8693, 930.75244, 817.1539, 788.13385, 963.0319, 714.5672, 637.6063]
2025-09-14 01:23:29,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [253.0, 259.0, 133.0, 226.0, 331.0, 286.0, 306.0, 312.0, 267.0, 248.0]
2025-09-14 01:23:29,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 13 hours, 31 minutes, 48 seconds)
2025-09-14 01:34:28,341 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:34:28,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:36:04,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 887.41150 ± 258.476
2025-09-14 01:36:04,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [682.7795, 736.6837, 832.11066, 610.90845, 1372.0905, 1189.5955, 654.85657, 1073.5789, 628.8043, 1092.7078]
2025-09-14 01:36:04,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [245.0, 268.0, 311.0, 259.0, 486.0, 380.0, 263.0, 366.0, 245.0, 402.0]
2025-09-14 01:36:04,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 13 hours, 19 minutes, 36 seconds)
2025-09-14 01:47:14,282 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:47:14,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:48:27,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 630.46753 ± 232.521
2025-09-14 01:48:27,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [671.3043, 896.3865, 538.8592, 758.7108, 723.6431, 358.04376, 761.2346, 859.1126, 646.28107, 91.09933]
2025-09-14 01:48:27,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [261.0, 318.0, 216.0, 299.0, 254.0, 161.0, 293.0, 281.0, 253.0, 151.0]
2025-09-14 01:48:27,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 13 hours, 8 minutes, 54 seconds)
2025-09-14 01:59:31,959 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:59:31,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:00:37,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 541.70117 ± 301.395
2025-09-14 02:00:37,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [710.9804, 5.13031, 736.6637, 639.7023, 831.9742, 744.8296, 9.075093, 740.74225, 284.79822, 713.1156]
2025-09-14 02:00:37,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [302.0, 15.0, 282.0, 242.0, 311.0, 271.0, 18.0, 313.0, 149.0, 272.0]
2025-09-14 02:00:37,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 12 hours, 53 minutes, 58 seconds)
2025-09-14 02:11:32,953 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:11:32,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:12:58,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 761.11255 ± 263.304
2025-09-14 02:12:58,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [775.118, 864.7201, 59.844723, 807.2565, 1063.4232, 673.7066, 717.60693, 1050.8259, 817.72174, 780.9017]
2025-09-14 02:12:58,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [334.0, 328.0, 81.0, 287.0, 349.0, 283.0, 271.0, 362.0, 277.0, 279.0]
2025-09-14 02:12:58,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 12 hours, 34 minutes, 57 seconds)
2025-09-14 02:24:08,232 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:24:08,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:25:32,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 710.41370 ± 119.680
2025-09-14 02:25:32,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [752.674, 797.151, 746.9969, 836.57526, 607.2952, 559.7347, 530.3095, 841.4466, 586.24255, 845.71136]
2025-09-14 02:25:32,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [283.0, 289.0, 281.0, 391.0, 221.0, 229.0, 220.0, 327.0, 247.0, 308.0]
2025-09-14 02:25:32,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 12 hours, 24 minutes, 41 seconds)
2025-09-14 02:36:46,357 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:36:46,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:37:52,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 594.55865 ± 299.070
2025-09-14 02:37:52,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [792.84515, 733.6313, 7.412042, 14.555718, 837.63684, 599.91394, 665.54083, 747.7995, 733.0288, 813.222]
2025-09-14 02:37:52,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [291.0, 313.0, 20.0, 25.0, 305.0, 194.0, 252.0, 287.0, 239.0, 296.0]
2025-09-14 02:37:52,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 12 hours, 9 minutes, 9 seconds)
2025-09-14 02:49:01,161 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:49:01,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:50:20,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 762.29919 ± 96.918
2025-09-14 02:50:20,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [666.0375, 876.66113, 713.1179, 652.78723, 802.6697, 924.7091, 829.54553, 817.3418, 718.02374, 622.098]
2025-09-14 02:50:20,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [236.0, 283.0, 251.0, 256.0, 286.0, 312.0, 282.0, 278.0, 264.0, 239.0]
2025-09-14 02:50:20,748 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 11 hours, 57 minutes, 50 seconds)
2025-09-14 03:01:16,446 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:01:16,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:02:39,579 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 779.30103 ± 77.150
2025-09-14 03:02:39,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [839.25256, 766.1955, 761.7779, 762.2099, 763.4141, 959.2546, 728.0921, 691.6901, 684.6655, 836.4579]
2025-09-14 03:02:39,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [301.0, 274.0, 249.0, 277.0, 269.0, 301.0, 264.0, 265.0, 248.0, 294.0]
2025-09-14 03:02:39,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 11 hours, 47 minutes, 15 seconds)
2025-09-14 03:13:44,567 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:13:44,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:15:12,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 801.27136 ± 151.582
2025-09-14 03:15:12,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [799.8321, 827.82245, 780.0989, 684.6251, 486.68503, 975.2481, 673.3788, 822.2771, 936.06824, 1026.6779]
2025-09-14 03:15:12,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [272.0, 301.0, 278.0, 276.0, 210.0, 327.0, 281.0, 285.0, 338.0, 380.0]
2025-09-14 03:15:12,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 11 hours, 36 minutes, 59 seconds)
2025-09-14 03:26:16,457 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:26:16,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:27:48,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 849.76721 ± 110.936
2025-09-14 03:27:48,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [807.4851, 1041.8778, 858.10175, 864.11566, 704.375, 704.1405, 793.26733, 780.10583, 1015.5497, 928.6533]
2025-09-14 03:27:48,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [294.0, 351.0, 288.0, 296.0, 259.0, 264.0, 273.0, 307.0, 371.0, 345.0]
2025-09-14 03:27:48,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 11 hours, 24 minutes, 54 seconds)
2025-09-14 03:39:06,946 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:39:06,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:40:29,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 771.00250 ± 169.115
2025-09-14 03:40:29,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [614.12823, 814.757, 907.88116, 1093.4575, 679.37445, 882.98346, 819.1934, 739.937, 718.67126, 439.64154]
2025-09-14 03:40:29,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [232.0, 293.0, 314.0, 366.0, 256.0, 317.0, 288.0, 260.0, 266.0, 188.0]
2025-09-14 03:40:29,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 11 hours, 16 minutes, 15 seconds)
2025-09-14 03:51:24,797 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 03:51:24,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 03:52:40,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 739.78473 ± 270.388
2025-09-14 03:52:40,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [918.27826, 754.897, 851.0366, 715.1783, 759.9085, 800.4715, 680.97455, 1104.8837, 807.67053, 4.548184]
2025-09-14 03:52:40,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [307.0, 256.0, 295.0, 267.0, 269.0, 266.0, 240.0, 358.0, 282.0, 17.0]
2025-09-14 03:52:40,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 11 hours, 40 seconds)
2025-09-14 04:03:50,394 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:03:50,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:05:17,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 804.65198 ± 102.563
2025-09-14 04:05:17,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [670.6019, 948.3991, 955.7363, 762.5084, 947.27094, 765.68475, 764.8361, 712.2028, 814.0134, 705.2659]
2025-09-14 04:05:17,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [265.0, 323.0, 344.0, 265.0, 327.0, 266.0, 273.0, 301.0, 283.0, 282.0]
2025-09-14 04:05:17,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 10 hours, 51 minutes, 17 seconds)
2025-09-14 04:16:16,291 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:16:16,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:17:42,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 820.06592 ± 296.716
2025-09-14 04:17:42,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [676.69165, 823.45325, 944.5776, 862.3257, 889.8912, 898.1985, 14.631361, 1146.539, 849.736, 1094.615]
2025-09-14 04:17:42,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [249.0, 271.0, 306.0, 325.0, 311.0, 294.0, 33.0, 402.0, 295.0, 385.0]
2025-09-14 04:17:42,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 10 hours, 37 minutes, 32 seconds)
2025-09-14 04:28:45,829 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:28:45,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:30:18,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 862.92792 ± 111.699
2025-09-14 04:30:18,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [802.217, 922.34973, 759.9061, 948.30817, 726.4937, 872.92096, 1109.1287, 788.5458, 764.7459, 934.6634]
2025-09-14 04:30:18,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [279.0, 336.0, 270.0, 348.0, 270.0, 307.0, 371.0, 268.0, 281.0, 339.0]
2025-09-14 04:30:18,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 10 hours, 25 minutes)
2025-09-14 04:41:25,393 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:41:25,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:43:04,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 913.11438 ± 161.443
2025-09-14 04:43:04,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [983.95465, 815.5715, 772.92444, 575.19165, 1049.7283, 1070.7156, 878.05426, 1131.3243, 828.0009, 1025.6786]
2025-09-14 04:43:04,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [339.0, 310.0, 294.0, 239.0, 374.0, 364.0, 312.0, 431.0, 287.0, 358.0]
2025-09-14 04:43:04,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (913.11) for latency ExtremeSparseL4U32
2025-09-14 04:43:04,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 10 hours, 13 minutes, 22 seconds)
2025-09-14 04:54:11,791 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 04:54:11,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 04:55:35,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 768.91089 ± 289.035
2025-09-14 04:55:35,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1059.2264, 720.4212, 710.9623, 1155.5845, 797.19415, 904.055, 733.6396, 11.3718405, 797.6697, 798.98395]
2025-09-14 04:55:35,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [366.0, 261.0, 286.0, 393.0, 268.0, 327.0, 263.0, 23.0, 303.0, 279.0]
2025-09-14 04:55:35,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 10 hours, 3 minutes, 58 seconds)
2025-09-14 05:06:32,630 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:06:32,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:08:11,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 920.67206 ± 114.908
2025-09-14 05:08:11,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1138.5377, 915.2648, 898.9346, 1031.616, 855.6484, 1000.45276, 971.86066, 892.6542, 758.0793, 743.67267]
2025-09-14 05:08:11,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [415.0, 313.0, 305.0, 348.0, 300.0, 342.0, 369.0, 326.0, 274.0, 266.0]
2025-09-14 05:08:11,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (920.67) for latency ExtremeSparseL4U32
2025-09-14 05:08:11,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 9 hours, 51 minutes, 24 seconds)
2025-09-14 05:19:21,410 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:19:21,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:20:43,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 751.98840 ± 269.363
2025-09-14 05:20:43,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [912.7734, 633.6272, 743.0181, 782.29095, 8.098834, 850.63666, 731.28485, 966.3193, 948.7465, 943.0885]
2025-09-14 05:20:43,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [337.0, 233.0, 262.0, 298.0, 20.0, 310.0, 267.0, 314.0, 324.0, 362.0]
2025-09-14 05:20:43,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 9 hours, 39 minutes, 46 seconds)
2025-09-14 05:31:44,005 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:31:44,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:32:55,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 667.62183 ± 248.509
2025-09-14 05:32:55,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [668.3334, 660.0336, 834.57635, 869.4275, 9.087103, 964.3939, 543.8783, 744.79816, 739.9403, 641.7495]
2025-09-14 05:32:55,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [225.0, 238.0, 284.0, 303.0, 20.0, 321.0, 226.0, 273.0, 260.0, 237.0]
2025-09-14 05:32:55,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 9 hours, 23 minutes, 29 seconds)
2025-09-14 05:44:02,871 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:44:02,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:45:37,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 907.82532 ± 185.054
2025-09-14 05:45:37,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1284.033, 771.661, 657.6644, 1186.5886, 979.6891, 903.3212, 843.6247, 746.51215, 870.6553, 834.50354]
2025-09-14 05:45:37,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [463.0, 262.0, 231.0, 409.0, 326.0, 303.0, 289.0, 286.0, 297.0, 293.0]
2025-09-14 05:45:37,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 9 hours, 10 minutes, 27 seconds)
2025-09-14 05:56:51,797 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 05:56:51,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 05:58:15,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 824.10596 ± 110.763
2025-09-14 05:58:15,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [845.09546, 1004.79755, 694.4594, 744.05206, 840.45496, 705.2516, 984.11505, 749.64105, 937.8998, 735.2929]
2025-09-14 05:58:15,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [287.0, 318.0, 242.0, 256.0, 322.0, 253.0, 318.0, 278.0, 314.0, 253.0]
2025-09-14 05:58:15,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 8 hours, 58 minutes, 57 seconds)
2025-09-14 06:09:27,136 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:09:27,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:10:38,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 677.63416 ± 353.512
2025-09-14 06:10:38,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [814.3766, 936.3185, 1144.542, 733.79913, 769.4222, 728.54095, 10.588646, 7.626825, 813.7987, 817.3277]
2025-09-14 06:10:38,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [289.0, 299.0, 384.0, 245.0, 256.0, 269.0, 21.0, 20.0, 323.0, 271.0]
2025-09-14 06:10:38,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 8 hours, 44 minutes, 28 seconds)
2025-09-14 06:21:36,362 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:21:36,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:22:46,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 642.82751 ± 354.554
2025-09-14 06:22:46,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1002.6342, 171.11792, 8.7596445, 742.27423, 729.03015, 1135.5293, 240.94781, 747.89734, 870.029, 780.0555]
2025-09-14 06:22:46,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [339.0, 108.0, 21.0, 255.0, 259.0, 390.0, 151.0, 291.0, 304.0, 262.0]
2025-09-14 06:22:46,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 8 hours, 28 minutes, 48 seconds)
2025-09-14 06:33:58,041 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:33:58,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:35:30,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 870.59149 ± 262.111
2025-09-14 06:35:30,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [174.30518, 918.6806, 1002.5593, 900.8242, 951.632, 887.0563, 982.82056, 733.7331, 906.76263, 1247.5413]
2025-09-14 06:35:30,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [174.0, 309.0, 339.0, 304.0, 334.0, 309.0, 315.0, 258.0, 296.0, 456.0]
2025-09-14 06:35:30,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 8 hours, 20 minutes, 42 seconds)
2025-09-14 06:46:42,574 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:46:42,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 06:47:53,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 668.05182 ± 256.732
2025-09-14 06:47:53,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [885.7873, 819.70544, 898.0361, 634.3641, 14.346317, 443.35593, 647.04016, 892.63574, 726.004, 719.2433]
2025-09-14 06:47:53,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [293.0, 268.0, 308.0, 229.0, 27.0, 201.0, 236.0, 286.0, 248.0, 267.0]
2025-09-14 06:47:53,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 8 hours, 5 minutes, 40 seconds)
2025-09-14 06:58:55,528 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 06:58:55,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:00:19,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 816.74658 ± 99.188
2025-09-14 07:00:19,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [833.4016, 842.8178, 647.57434, 780.1204, 945.69147, 842.9689, 930.69965, 685.0327, 730.8674, 928.29126]
2025-09-14 07:00:19,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [274.0, 283.0, 234.0, 268.0, 329.0, 280.0, 303.0, 268.0, 254.0, 317.0]
2025-09-14 07:00:19,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 7 hours, 51 minutes, 45 seconds)
2025-09-14 07:11:32,557 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:11:32,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:12:50,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 712.20160 ± 179.604
2025-09-14 07:12:50,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [933.0659, 619.71796, 478.96893, 351.8745, 909.1831, 859.92194, 691.0469, 851.61694, 697.5175, 729.10223]
2025-09-14 07:12:50,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [304.0, 229.0, 188.0, 298.0, 282.0, 274.0, 257.0, 279.0, 235.0, 258.0]
2025-09-14 07:12:50,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 7 hours, 40 minutes, 18 seconds)
2025-09-14 07:23:53,279 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:23:53,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:25:16,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 799.97437 ± 140.955
2025-09-14 07:25:16,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [694.1575, 953.21423, 518.37885, 648.4262, 827.06165, 936.67706, 994.78546, 777.7595, 788.1921, 861.09125]
2025-09-14 07:25:16,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [262.0, 311.0, 210.0, 233.0, 278.0, 321.0, 331.0, 302.0, 260.0, 310.0]
2025-09-14 07:25:16,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 7 hours, 29 minutes, 59 seconds)
2025-09-14 07:36:21,211 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:36:21,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:37:50,976 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 882.76746 ± 150.769
2025-09-14 07:37:50,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [881.38763, 946.6791, 1006.43317, 834.19965, 722.55035, 551.3132, 891.3569, 971.83887, 1131.7283, 890.18774]
2025-09-14 07:37:50,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [312.0, 327.0, 333.0, 273.0, 256.0, 215.0, 311.0, 332.0, 363.0, 302.0]
2025-09-14 07:37:50,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 7 hours, 16 minutes, 22 seconds)
2025-09-14 07:48:55,890 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 07:48:55,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 07:50:10,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 719.33508 ± 253.305
2025-09-14 07:50:10,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [13.097952, 714.1974, 813.7761, 786.6418, 909.58813, 932.53656, 626.4981, 683.3621, 859.6081, 854.04443]
2025-09-14 07:50:10,713 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 242.0, 285.0, 291.0, 296.0, 310.0, 229.0, 253.0, 290.0, 267.0]
2025-09-14 07:50:10,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 3 minutes, 30 seconds)
2025-09-14 08:01:03,846 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:01:03,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:02:27,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 802.99615 ± 191.613
2025-09-14 08:02:27,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [919.78723, 940.3142, 669.9106, 841.56616, 712.2042, 1087.8431, 777.88007, 386.5282, 1002.10535, 691.8223]
2025-09-14 08:02:27,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [313.0, 318.0, 233.0, 292.0, 248.0, 336.0, 253.0, 138.0, 430.0, 245.0]
2025-09-14 08:02:27,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 6 hours, 50 minutes, 1 second)
2025-09-14 08:13:35,682 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:13:35,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:15:01,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 847.99670 ± 209.162
2025-09-14 08:15:01,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1120.5753, 818.58386, 899.4873, 833.0422, 1022.8364, 1034.062, 783.38367, 326.8595, 728.3839, 912.7528]
2025-09-14 08:15:01,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [361.0, 277.0, 279.0, 273.0, 390.0, 350.0, 270.0, 152.0, 259.0, 296.0]
2025-09-14 08:15:01,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 6 hours, 37 minutes, 58 seconds)
2025-09-14 08:26:18,780 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:26:18,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:27:40,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 807.16779 ± 115.159
2025-09-14 08:27:40,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [910.17346, 865.6229, 573.81915, 991.44995, 910.6817, 728.1785, 792.82214, 694.34686, 813.25055, 791.3327]
2025-09-14 08:27:40,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [302.0, 278.0, 205.0, 315.0, 304.0, 255.0, 268.0, 259.0, 280.0, 277.0]
2025-09-14 08:27:40,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 26 minutes, 54 seconds)
2025-09-14 08:38:35,749 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:38:35,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:40:00,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 822.24792 ± 271.603
2025-09-14 08:40:00,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [673.8059, 1228.0013, 689.70685, 950.86304, 692.7163, 885.20593, 1029.8197, 936.2758, 962.27075, 173.81422]
2025-09-14 08:40:00,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [241.0, 376.0, 245.0, 339.0, 244.0, 320.0, 342.0, 309.0, 316.0, 123.0]
2025-09-14 08:40:00,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 12 minutes, 57 seconds)
2025-09-14 08:51:22,707 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 08:51:22,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 08:52:39,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 734.57556 ± 260.059
2025-09-14 08:52:39,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [947.13196, 900.3171, 655.8843, 907.37103, 823.7136, 15.314432, 919.7496, 675.994, 795.5078, 704.77155]
2025-09-14 08:52:39,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [330.0, 287.0, 218.0, 308.0, 287.0, 29.0, 312.0, 248.0, 272.0, 249.0]
2025-09-14 08:52:39,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 2 minutes, 21 seconds)
2025-09-14 09:03:26,419 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:03:26,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:04:43,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 745.90607 ± 393.179
2025-09-14 09:04:43,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [954.61444, 999.9391, 683.4653, 1230.0017, 787.40265, 8.529495, 995.1478, 825.98083, 11.856758, 962.1226]
2025-09-14 09:04:43,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [338.0, 326.0, 290.0, 392.0, 262.0, 20.0, 364.0, 280.0, 26.0, 323.0]
2025-09-14 09:04:43,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 5 hours, 48 minutes, 44 seconds)
2025-09-14 09:15:47,315 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:15:47,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:17:12,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 859.79871 ± 166.581
2025-09-14 09:17:12,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [911.6239, 1261.1432, 828.08984, 769.2886, 866.04443, 698.8559, 736.74384, 881.55585, 995.7624, 648.8794]
2025-09-14 09:17:12,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [302.0, 377.0, 290.0, 262.0, 282.0, 242.0, 251.0, 354.0, 338.0, 236.0]
2025-09-14 09:17:13,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 5 hours, 35 minutes, 51 seconds)
2025-09-14 09:28:19,849 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:28:19,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:29:49,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 930.14893 ± 138.130
2025-09-14 09:29:49,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1085.3901, 1106.5616, 976.6403, 674.3434, 884.9101, 739.64465, 904.0978, 940.2413, 1096.0594, 893.60077]
2025-09-14 09:29:49,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [355.0, 346.0, 307.0, 243.0, 273.0, 251.0, 292.0, 320.0, 349.0, 287.0]
2025-09-14 09:29:49,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (930.15) for latency ExtremeSparseL4U32
2025-09-14 09:29:49,304 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 23 minutes, 7 seconds)
2025-09-14 09:40:50,149 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:40:50,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:42:13,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 800.57843 ± 119.543
2025-09-14 09:42:13,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [800.699, 905.9588, 782.5533, 917.48926, 970.0941, 703.8285, 653.7939, 702.5961, 629.4002, 939.37134]
2025-09-14 09:42:13,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [286.0, 317.0, 278.0, 309.0, 341.0, 244.0, 237.0, 230.0, 238.0, 330.0]
2025-09-14 09:42:13,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 11 minutes, 6 seconds)
2025-09-14 09:53:10,781 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 09:53:10,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 09:54:45,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 944.88641 ± 244.404
2025-09-14 09:54:45,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [700.0189, 1074.5216, 730.0704, 910.08124, 1067.3484, 933.9865, 1575.5697, 909.5315, 756.72943, 791.0065]
2025-09-14 09:54:45,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [254.0, 355.0, 255.0, 300.0, 334.0, 328.0, 482.0, 302.0, 273.0, 280.0]
2025-09-14 09:54:45,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (944.89) for latency ExtremeSparseL4U32
2025-09-14 09:54:45,890 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 4 hours, 58 minutes, 7 seconds)
2025-09-14 10:05:34,602 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:05:34,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:06:47,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 708.77496 ± 387.893
2025-09-14 10:06:47,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [682.047, 897.1157, 739.74896, 586.1031, 947.6464, 962.8033, 9.361144, 1157.9259, 8.868899, 1096.1288]
2025-09-14 10:06:47,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [243.0, 282.0, 250.0, 206.0, 311.0, 303.0, 24.0, 368.0, 24.0, 382.0]
2025-09-14 10:06:47,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 4 hours, 45 minutes, 27 seconds)
2025-09-14 10:17:48,949 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:17:48,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:19:23,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 959.64795 ± 184.498
2025-09-14 10:19:23,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [434.84503, 1139.9288, 1016.4547, 988.3613, 972.3256, 1037.5605, 1073.4622, 990.16864, 908.36646, 1035.0059]
2025-09-14 10:19:23,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [164.0, 364.0, 320.0, 329.0, 317.0, 339.0, 371.0, 322.0, 291.0, 333.0]
2025-09-14 10:19:23,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (959.65) for latency ExtremeSparseL4U32
2025-09-14 10:19:23,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 4 hours, 33 minutes, 33 seconds)
2025-09-14 10:30:26,270 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:30:26,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:32:00,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 966.47644 ± 172.129
2025-09-14 10:32:00,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [826.77045, 1249.3632, 759.92834, 984.10596, 978.307, 735.73016, 1097.1055, 1163.8696, 1080.3054, 789.2792]
2025-09-14 10:32:00,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [308.0, 410.0, 267.0, 319.0, 333.0, 258.0, 354.0, 367.0, 332.0, 277.0]
2025-09-14 10:32:00,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (966.48) for latency ExtremeSparseL4U32
2025-09-14 10:32:00,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 21 minutes, 12 seconds)
2025-09-14 10:42:59,257 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:42:59,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:44:31,881 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 916.98517 ± 110.321
2025-09-14 10:44:31,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [780.2912, 1092.3309, 973.0803, 977.1473, 798.3664, 723.5136, 956.4562, 1027.6838, 911.2505, 929.73193]
2025-09-14 10:44:31,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [274.0, 362.0, 323.0, 324.0, 280.0, 252.0, 343.0, 322.0, 287.0, 300.0]
2025-09-14 10:44:31,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 9 minutes, 11 seconds)
2025-09-14 10:55:35,261 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 10:55:35,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 10:57:08,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 988.74915 ± 197.699
2025-09-14 10:57:08,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1205.4795, 1327.3174, 805.9414, 737.04004, 971.356, 1123.9733, 881.69305, 707.0725, 986.95825, 1140.66]
2025-09-14 10:57:08,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [380.0, 418.0, 284.0, 254.0, 320.0, 356.0, 292.0, 260.0, 338.0, 360.0]
2025-09-14 10:57:08,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (988.75) for latency ExtremeSparseL4U32
2025-09-14 10:57:08,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 3 hours, 57 minutes, 3 seconds)
2025-09-14 11:07:36,624 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:07:36,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:09:09,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 979.80798 ± 152.980
2025-09-14 11:09:09,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1107.7749, 845.1828, 835.08673, 1276.0468, 917.5296, 1047.1381, 737.85266, 1008.5231, 1114.0475, 908.8973]
2025-09-14 11:09:09,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [340.0, 278.0, 272.0, 401.0, 312.0, 364.0, 265.0, 334.0, 350.0, 307.0]
2025-09-14 11:09:09,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 3 hours, 44 minutes, 33 seconds)
2025-09-14 11:19:40,620 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:19:40,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:21:16,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1032.41846 ± 196.831
2025-09-14 11:21:16,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1151.2874, 1197.658, 1368.1495, 718.2977, 1031.8915, 721.9337, 1019.7219, 977.70184, 941.4809, 1196.0623]
2025-09-14 11:21:16,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [365.0, 374.0, 414.0, 260.0, 329.0, 253.0, 329.0, 316.0, 301.0, 355.0]
2025-09-14 11:21:16,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (1032.42) for latency ExtremeSparseL4U32
2025-09-14 11:21:16,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 30 minutes, 22 seconds)
2025-09-14 11:31:57,390 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:31:57,401 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:33:25,581 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 921.89636 ± 183.516
2025-09-14 11:33:25,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [701.9627, 648.5607, 1050.898, 1295.4276, 970.1334, 1053.675, 834.30695, 954.6672, 754.538, 954.7937]
2025-09-14 11:33:25,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [242.0, 237.0, 338.0, 412.0, 311.0, 334.0, 278.0, 311.0, 256.0, 291.0]
2025-09-14 11:33:25,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 16 minutes, 30 seconds)
2025-09-14 11:43:45,835 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:43:45,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:45:30,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1095.41711 ± 466.854
2025-09-14 11:45:30,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [994.08484, 1180.3579, 885.2121, 840.73395, 2380.371, 1064.217, 937.9505, 1268.7542, 834.14844, 568.341]
2025-09-14 11:45:30,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [302.0, 370.0, 310.0, 282.0, 773.0, 368.0, 289.0, 384.0, 315.0, 226.0]
2025-09-14 11:45:30,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (1095.42) for latency ExtremeSparseL4U32
2025-09-14 11:45:30,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 2 minutes, 56 seconds)
2025-09-14 11:55:56,937 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 11:55:56,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 11:57:25,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 934.05042 ± 96.665
2025-09-14 11:57:25,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [832.8543, 870.78406, 886.9982, 885.84576, 996.85455, 1057.283, 1086.7352, 969.2835, 987.50275, 766.3624]
2025-09-14 11:57:25,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [275.0, 291.0, 288.0, 301.0, 317.0, 337.0, 354.0, 322.0, 323.0, 267.0]
2025-09-14 11:57:25,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 2 hours, 48 minutes, 46 seconds)
2025-09-14 12:07:53,455 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 12:07:53,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:09:33,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1041.05896 ± 259.388
2025-09-14 12:09:33,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [911.8687, 1251.255, 1458.9124, 784.7031, 759.3669, 987.9601, 725.6363, 952.1821, 1460.7749, 1117.9294]
2025-09-14 12:09:33,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [311.0, 368.0, 439.0, 263.0, 273.0, 329.0, 249.0, 312.0, 488.0, 349.0]
2025-09-14 12:09:33,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 2 hours, 37 minutes)
2025-09-14 12:20:22,816 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 12:20:22,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:21:50,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 887.28662 ± 165.998
2025-09-14 12:21:50,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [669.46857, 796.38477, 1266.1892, 874.76404, 740.6558, 966.2713, 1020.5905, 976.1433, 770.97906, 791.4194]
2025-09-14 12:21:50,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [259.0, 272.0, 403.0, 283.0, 276.0, 312.0, 361.0, 319.0, 260.0, 272.0]
2025-09-14 12:21:50,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 25 minutes, 23 seconds)
2025-09-14 12:32:03,012 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 12:32:03,014 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:33:34,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 953.32843 ± 206.932
2025-09-14 12:33:34,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [890.1335, 971.9212, 669.75574, 1127.4092, 1254.7881, 745.62415, 797.4913, 739.6082, 1076.7175, 1259.8347]
2025-09-14 12:33:34,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [293.0, 301.0, 242.0, 372.0, 382.0, 253.0, 279.0, 253.0, 367.0, 387.0]
2025-09-14 12:33:34,274 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 12 minutes, 19 seconds)
2025-09-14 12:43:58,073 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 12:43:58,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:45:42,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1082.52454 ± 269.008
2025-09-14 12:45:42,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [723.90784, 738.36633, 1032.9355, 1224.8635, 1061.6251, 1378.8923, 1027.0878, 916.619, 1060.0542, 1660.8939]
2025-09-14 12:45:42,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [262.0, 267.0, 325.0, 382.0, 337.0, 425.0, 338.0, 316.0, 369.0, 521.0]
2025-09-14 12:45:42,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 22 seconds)
2025-09-14 12:56:10,807 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 12:56:10,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 12:58:02,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1187.45374 ± 268.942
2025-09-14 12:58:02,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1026.1887, 1490.6375, 1557.0232, 1266.193, 1153.3838, 959.14606, 734.29346, 1592.5474, 1028.7124, 1066.4121]
2025-09-14 12:58:02,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [331.0, 439.0, 472.0, 376.0, 387.0, 317.0, 266.0, 515.0, 348.0, 355.0]
2025-09-14 12:58:02,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (1187.45) for latency ExtremeSparseL4U32
2025-09-14 12:58:02,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 1 hour, 49 minutes, 6 seconds)
2025-09-14 13:08:38,125 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 13:08:38,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 13:09:59,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 865.47668 ± 356.743
2025-09-14 13:09:59,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1316.001, 1055.2993, 1043.8485, 1016.08826, 646.6435, 960.0835, 1225.3298, 10.069696, 691.66437, 689.7384]
2025-09-14 13:09:59,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [387.0, 336.0, 328.0, 335.0, 245.0, 306.0, 381.0, 22.0, 234.0, 247.0]
2025-09-14 13:09:59,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 36 minutes, 42 seconds)
2025-09-14 13:20:32,862 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 13:20:32,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 13:22:09,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 987.80450 ± 388.083
2025-09-14 13:22:09,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [12.378608, 1527.3091, 1207.701, 1072.935, 1221.146, 1132.4833, 1064.2137, 1017.89825, 988.9061, 633.07294]
2025-09-14 13:22:09,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 486.0, 395.0, 342.0, 379.0, 377.0, 339.0, 353.0, 346.0, 233.0]
2025-09-14 13:22:09,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 24 minutes, 26 seconds)
2025-09-14 13:32:31,411 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 13:32:31,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 13:34:06,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 965.30829 ± 133.088
2025-09-14 13:34:06,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [993.0519, 940.13983, 1077.353, 762.6768, 939.1722, 694.9413, 1004.83966, 1048.7296, 1148.9917, 1043.1869]
2025-09-14 13:34:06,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [336.0, 325.0, 372.0, 258.0, 296.0, 300.0, 335.0, 343.0, 373.0, 332.0]
2025-09-14 13:34:06,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 12 minutes, 38 seconds)
2025-09-14 13:44:59,763 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 13:44:59,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 13:46:45,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1104.45349 ± 744.493
2025-09-14 13:46:45,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [10.1699295, 775.8087, 1311.2872, 618.8982, 1016.5287, 1156.5391, 1893.1086, 699.62366, 721.52783, 2841.0435]
2025-09-14 13:46:45,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [31.0, 293.0, 400.0, 241.0, 323.0, 364.0, 596.0, 251.0, 259.0, 844.0]
2025-09-14 13:46:45,072 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 1 minute, 2 seconds)
2025-09-14 13:57:03,142 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 13:57:03,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 13:58:52,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1193.28357 ± 709.324
2025-09-14 13:58:52,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1286.1895, 1130.6594, 1344.1979, 1125.2269, 8.564995, 17.706755, 1786.5552, 1329.6984, 1367.2246, 2536.813]
2025-09-14 13:58:52,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [383.0, 346.0, 425.0, 359.0, 20.0, 33.0, 549.0, 430.0, 437.0, 834.0]
2025-09-14 13:58:52,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (1193.28) for latency ExtremeSparseL4U32
2025-09-14 13:58:52,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 48 minutes, 40 seconds)
2025-09-14 14:09:14,401 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 14:09:14,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 14:10:53,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1043.78210 ± 386.991
2025-09-14 14:10:53,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1450.4805, 9.212761, 925.5182, 1101.7856, 958.9002, 941.1665, 1400.2366, 1246.4338, 1267.5232, 1136.5638]
2025-09-14 14:10:53,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [472.0, 24.0, 307.0, 353.0, 309.0, 300.0, 423.0, 391.0, 435.0, 361.0]
2025-09-14 14:10:53,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 36 minutes, 32 seconds)
2025-09-14 14:21:51,438 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 14:21:51,441 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 14:24:00,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1409.42175 ± 281.815
2025-09-14 14:24:00,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1268.0586, 1149.8333, 1469.2258, 1570.1207, 1061.1506, 1144.7815, 1725.7524, 1333.9832, 2023.0951, 1348.2172]
2025-09-14 14:24:00,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [415.0, 363.0, 459.0, 515.0, 356.0, 372.0, 509.0, 421.0, 617.0, 413.0]
2025-09-14 14:24:00,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1226 [INFO]: New best (1409.42) for latency ExtremeSparseL4U32
2025-09-14 14:24:00,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 24 minutes, 44 seconds)
2025-09-14 14:34:28,192 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 14:34:28,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 14:36:29,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1329.41882 ± 260.612
2025-09-14 14:36:29,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [1850.0502, 1609.2797, 970.9853, 1482.9637, 1249.94, 1255.7339, 1202.0176, 947.35406, 1390.8604, 1335.004]
2025-09-14 14:36:29,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [589.0, 496.0, 315.0, 468.0, 386.0, 396.0, 356.0, 328.0, 456.0, 424.0]
2025-09-14 14:36:29,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 12 minutes, 28 seconds)
2025-09-14 14:46:49,332 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 14:46:49,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 14:48:50,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 1334.82568 ± 336.567
2025-09-14 14:48:50,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [997.0941, 1318.0056, 1137.3984, 1268.6069, 1568.1678, 791.74164, 1363.8654, 1392.5488, 2121.2778, 1389.5515]
2025-09-14 14:48:50,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [345.0, 394.0, 355.0, 414.0, 472.0, 277.0, 418.0, 444.0, 633.0, 448.0]
2025-09-14 14:48:50,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc10-walker2d):1251 [DEBUG]: Training session finished
