2025-09-13 04:59:45,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc0-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay
2025-09-13 04:59:45,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc0-humanoid/ExtremeSparseL4U32-mbpac-highdim-memdelay
2025-09-13 04:59:45,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x15169fddd2d0>}
2025-09-13 04:59:45,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1111 [DEBUG]: using device: cuda
2025-09-13 04:59:45,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1133 [INFO]: Creating new trainer
2025-09-13 04:59:45,680 baseline-mbpac-noiseperc0-humanoid:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-13 04:59:45,680 baseline-mbpac-noiseperc0-humanoid:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-13 04:59:45,691 baseline-mbpac-noiseperc0-humanoid:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-09-13 04:59:47,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1194 [DEBUG]: Starting training session...
2025-09-13 04:59:47,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 1/100
2025-09-13 05:12:21,058 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:12:21,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:12:36,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 285.62259 ± 24.185
2025-09-13 05:12:36,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [306.73923, 244.36946, 333.35242, 278.03787, 291.32104, 294.31442, 293.94028, 255.79678, 268.79272, 289.5615]
2025-09-13 05:12:36,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [56.0, 45.0, 63.0, 51.0, 54.0, 54.0, 54.0, 47.0, 49.0, 53.0]
2025-09-13 05:12:36,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (285.62) for latency ExtremeSparseL4U32
2025-09-13 05:12:36,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 21 hours, 9 minutes, 42 seconds)
2025-09-13 05:24:11,834 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:24:11,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:24:38,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 435.73517 ± 134.522
2025-09-13 05:24:38,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [417.11792, 405.2964, 789.05963, 304.73575, 296.82605, 433.38644, 372.17847, 424.00833, 375.9013, 538.8411]
2025-09-13 05:24:38,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 87.0, 155.0, 65.0, 55.0, 81.0, 69.0, 79.0, 82.0, 116.0]
2025-09-13 05:24:38,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (435.74) for latency ExtremeSparseL4U32
2025-09-13 05:24:38,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 20 hours, 17 minutes, 41 seconds)
2025-09-13 05:36:16,822 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:36:16,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:36:39,396 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 385.65060 ± 66.755
2025-09-13 05:36:39,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [289.8071, 324.25458, 439.39206, 310.6617, 469.81177, 397.84064, 446.37512, 336.9879, 360.05527, 481.32013]
2025-09-13 05:36:39,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [55.0, 60.0, 88.0, 66.0, 94.0, 76.0, 85.0, 66.0, 67.0, 93.0]
2025-09-13 05:36:39,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 19 hours, 52 minutes)
2025-09-13 05:48:15,944 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 05:48:15,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 05:48:36,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 361.26929 ± 42.796
2025-09-13 05:48:36,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [293.77972, 299.36346, 354.52304, 345.73972, 395.58926, 376.63922, 349.4539, 390.0251, 361.1017, 446.47754]
2025-09-13 05:48:36,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [54.0, 56.0, 66.0, 64.0, 73.0, 69.0, 64.0, 75.0, 67.0, 83.0]
2025-09-13 05:48:36,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 19 hours, 31 minutes, 33 seconds)
2025-09-13 06:00:14,726 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:00:14,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:00:37,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 403.12042 ± 72.964
2025-09-13 06:00:37,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [410.0262, 366.9095, 355.00653, 380.85703, 418.2984, 350.2793, 586.61646, 343.4874, 475.92172, 343.80173]
2025-09-13 06:00:37,577 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 68.0, 65.0, 71.0, 76.0, 63.0, 109.0, 63.0, 89.0, 63.0]
2025-09-13 06:00:37,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 19 hours, 15 minutes, 52 seconds)
2025-09-13 06:12:18,810 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:12:18,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:12:48,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 493.09677 ± 146.551
2025-09-13 06:12:48,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [331.31442, 850.417, 585.0951, 390.86874, 561.8103, 489.7039, 471.28085, 493.069, 451.59085, 305.8175]
2025-09-13 06:12:48,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 175.0, 121.0, 74.0, 105.0, 92.0, 91.0, 92.0, 95.0, 67.0]
2025-09-13 06:12:48,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (493.10) for latency ExtremeSparseL4U32
2025-09-13 06:12:48,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 18 hours, 51 minutes, 39 seconds)
2025-09-13 06:24:22,353 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:24:22,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:24:51,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 493.88867 ± 86.159
2025-09-13 06:24:51,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [536.0289, 532.7751, 368.06088, 600.6127, 379.78824, 611.29315, 552.5571, 519.9039, 390.96475, 446.90176]
2025-09-13 06:24:51,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 102.0, 80.0, 114.0, 70.0, 115.0, 117.0, 104.0, 73.0, 82.0]
2025-09-13 06:24:51,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (493.89) for latency ExtremeSparseL4U32
2025-09-13 06:24:51,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 18 hours, 40 minutes, 2 seconds)
2025-09-13 06:36:28,189 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:36:28,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:36:55,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 465.27362 ± 53.961
2025-09-13 06:36:55,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [408.72556, 431.50528, 553.8886, 453.2579, 492.49054, 503.77905, 443.9216, 541.354, 375.72006, 448.09375]
2025-09-13 06:36:55,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 93.0, 103.0, 85.0, 100.0, 95.0, 84.0, 102.0, 80.0, 82.0]
2025-09-13 06:36:55,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 18 hours, 29 minutes, 2 seconds)
2025-09-13 06:48:32,431 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 06:48:32,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 06:49:03,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 535.96161 ± 90.084
2025-09-13 06:49:03,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [440.9321, 610.85, 450.30466, 587.97394, 584.94183, 471.04297, 503.37686, 518.0959, 452.4287, 739.66956]
2025-09-13 06:49:03,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 117.0, 84.0, 121.0, 111.0, 88.0, 95.0, 96.0, 98.0, 146.0]
2025-09-13 06:49:03,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (535.96) for latency ExtremeSparseL4U32
2025-09-13 06:49:03,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 18 hours, 20 minutes, 13 seconds)
2025-09-13 07:00:38,906 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:00:38,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:01:11,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 553.12146 ± 154.045
2025-09-13 07:01:11,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [532.30054, 644.59186, 424.24982, 458.88577, 843.43317, 814.19965, 378.02466, 524.20996, 461.82812, 449.4907]
2025-09-13 07:01:11,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 137.0, 85.0, 86.0, 164.0, 168.0, 69.0, 98.0, 99.0, 86.0]
2025-09-13 07:01:11,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (553.12) for latency ExtremeSparseL4U32
2025-09-13 07:01:11,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 18 hours, 10 minutes, 18 seconds)
2025-09-13 07:12:49,535 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:12:49,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:13:17,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 488.05753 ± 67.774
2025-09-13 07:13:17,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [520.6577, 502.91284, 399.70212, 531.943, 467.55072, 448.3849, 367.89548, 620.04553, 514.86633, 506.6166]
2025-09-13 07:13:17,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 108.0, 74.0, 100.0, 89.0, 83.0, 79.0, 118.0, 96.0, 96.0]
2025-09-13 07:13:17,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 17 hours, 56 minutes, 39 seconds)
2025-09-13 07:24:54,887 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:24:54,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:25:24,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 495.43427 ± 62.472
2025-09-13 07:25:24,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [487.40714, 508.90378, 435.9324, 579.6655, 619.79034, 439.421, 500.98215, 426.14645, 523.3722, 432.72192]
2025-09-13 07:25:24,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 95.0, 95.0, 123.0, 117.0, 82.0, 94.0, 80.0, 99.0, 89.0]
2025-09-13 07:25:24,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 17 hours, 45 minutes, 40 seconds)
2025-09-13 07:36:59,328 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:36:59,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:37:30,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 528.86969 ± 131.888
2025-09-13 07:37:30,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [441.12595, 657.231, 413.61105, 532.10956, 836.30536, 384.66708, 430.37463, 606.85114, 455.55374, 530.86725]
2025-09-13 07:37:30,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 137.0, 77.0, 99.0, 165.0, 71.0, 79.0, 122.0, 85.0, 111.0]
2025-09-13 07:37:30,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 17 hours, 34 minutes, 10 seconds)
2025-09-13 07:49:05,184 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 07:49:05,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 07:49:35,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 514.69287 ± 99.506
2025-09-13 07:49:35,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [439.12137, 638.4955, 687.4115, 507.04916, 444.31604, 403.54016, 505.6059, 592.95087, 562.89966, 365.538]
2025-09-13 07:49:35,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 127.0, 129.0, 96.0, 83.0, 74.0, 96.0, 118.0, 115.0, 79.0]
2025-09-13 07:49:35,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 17 hours, 21 minutes, 18 seconds)
2025-09-13 08:01:15,955 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:01:15,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:01:45,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 515.45251 ± 64.908
2025-09-13 08:01:45,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [560.8746, 543.78424, 598.9101, 538.3483, 521.82666, 613.4967, 418.08817, 427.96268, 468.8395, 462.39413]
2025-09-13 08:01:45,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 102.0, 122.0, 102.0, 112.0, 115.0, 76.0, 79.0, 85.0, 86.0]
2025-09-13 08:01:45,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 17 hours, 9 minutes, 30 seconds)
2025-09-13 08:13:20,249 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:13:20,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:13:48,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 479.99454 ± 55.816
2025-09-13 08:13:48,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [508.63602, 421.9278, 405.70682, 472.66537, 448.65952, 504.43103, 460.6729, 601.6402, 440.27286, 535.3327]
2025-09-13 08:13:48,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 78.0, 86.0, 103.0, 90.0, 109.0, 87.0, 113.0, 95.0, 100.0]
2025-09-13 08:13:48,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 16 hours, 56 minutes, 41 seconds)
2025-09-13 08:25:25,793 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:25:25,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:25:58,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 566.47650 ± 76.040
2025-09-13 08:25:58,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [630.1807, 635.36694, 531.42456, 603.224, 586.36224, 575.44916, 360.19034, 545.588, 577.016, 619.9631]
2025-09-13 08:25:58,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 122.0, 98.0, 114.0, 111.0, 109.0, 78.0, 102.0, 117.0, 117.0]
2025-09-13 08:25:58,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (566.48) for latency ExtremeSparseL4U32
2025-09-13 08:25:58,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 16 hours, 45 minutes, 22 seconds)
2025-09-13 08:37:31,264 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:37:31,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:38:03,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 550.64716 ± 63.272
2025-09-13 08:38:03,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [597.1833, 591.987, 530.489, 435.91345, 556.4439, 662.56635, 588.00543, 557.34094, 461.60928, 524.9325]
2025-09-13 08:38:03,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 112.0, 112.0, 84.0, 105.0, 124.0, 113.0, 104.0, 100.0, 98.0]
2025-09-13 08:38:03,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 16 hours, 32 minutes, 53 seconds)
2025-09-13 08:49:41,951 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 08:49:41,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 08:50:22,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 668.24310 ± 221.185
2025-09-13 08:50:22,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [468.37106, 1160.9756, 535.91675, 450.6803, 714.0273, 507.11823, 852.0923, 894.9287, 553.95917, 544.362]
2025-09-13 08:50:22,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 230.0, 100.0, 94.0, 140.0, 102.0, 166.0, 186.0, 103.0, 102.0]
2025-09-13 08:50:22,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (668.24) for latency ExtremeSparseL4U32
2025-09-13 08:50:22,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 16 hours, 24 minutes, 39 seconds)
2025-09-13 09:02:01,704 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:02:01,710 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:02:37,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 609.21613 ± 117.910
2025-09-13 09:02:37,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [518.7416, 563.01746, 578.23706, 530.7946, 568.79816, 462.49203, 863.38544, 576.6683, 787.08905, 642.93726]
2025-09-13 09:02:37,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 105.0, 112.0, 107.0, 110.0, 98.0, 164.0, 110.0, 147.0, 124.0]
2025-09-13 09:02:37,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 16 hours, 13 minutes, 49 seconds)
2025-09-13 09:14:08,395 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:14:08,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:14:49,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 686.66052 ± 239.934
2025-09-13 09:14:49,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [692.1454, 956.4825, 492.20993, 1266.1259, 540.03973, 457.68753, 559.47095, 549.6709, 776.8251, 575.9468]
2025-09-13 09:14:49,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 186.0, 102.0, 255.0, 99.0, 86.0, 107.0, 103.0, 151.0, 111.0]
2025-09-13 09:14:49,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (686.66) for latency ExtremeSparseL4U32
2025-09-13 09:14:49,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 16 hours, 3 minutes, 58 seconds)
2025-09-13 09:26:32,939 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:26:32,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:27:09,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 611.09558 ± 145.490
2025-09-13 09:27:09,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [663.551, 552.80426, 451.53302, 464.518, 461.99426, 814.34204, 655.98846, 870.81714, 698.3243, 477.08344]
2025-09-13 09:27:09,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 111.0, 90.0, 94.0, 87.0, 156.0, 135.0, 163.0, 142.0, 94.0]
2025-09-13 09:27:09,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 15 hours, 54 minutes, 22 seconds)
2025-09-13 09:38:46,534 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:38:46,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:39:19,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 573.25952 ± 162.171
2025-09-13 09:39:19,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [573.9838, 551.2433, 514.5817, 1006.2132, 461.55536, 665.0583, 492.77716, 460.3146, 607.66376, 399.204]
2025-09-13 09:39:19,189 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 103.0, 95.0, 192.0, 86.0, 123.0, 92.0, 94.0, 113.0, 86.0]
2025-09-13 09:39:19,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 15 hours, 43 minutes, 26 seconds)
2025-09-13 09:50:50,636 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 09:50:50,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 09:51:21,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 545.75464 ± 85.077
2025-09-13 09:51:21,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [665.7455, 448.78104, 509.52698, 690.53906, 512.88293, 564.419, 508.756, 498.13782, 631.1219, 427.63614]
2025-09-13 09:51:21,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 84.0, 103.0, 131.0, 94.0, 109.0, 109.0, 91.0, 118.0, 79.0]
2025-09-13 09:51:21,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 15 hours, 26 minutes, 57 seconds)
2025-09-13 10:02:56,439 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:02:56,446 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:03:39,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 722.94531 ± 112.023
2025-09-13 10:03:39,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [848.24786, 896.92804, 685.6972, 574.06445, 745.51733, 537.8119, 634.9113, 791.898, 813.4625, 700.9143]
2025-09-13 10:03:39,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 187.0, 147.0, 108.0, 147.0, 107.0, 121.0, 170.0, 152.0, 144.0]
2025-09-13 10:03:39,488 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (722.95) for latency ExtremeSparseL4U32
2025-09-13 10:03:39,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 15 hours, 15 minutes, 32 seconds)
2025-09-13 10:15:22,071 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:15:22,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:16:00,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 636.41125 ± 157.142
2025-09-13 10:16:00,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [860.16095, 656.55286, 556.4174, 582.57056, 454.70752, 962.9801, 567.7255, 444.34723, 693.55334, 585.09784]
2025-09-13 10:16:00,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [170.0, 125.0, 110.0, 121.0, 93.0, 189.0, 119.0, 97.0, 131.0, 124.0]
2025-09-13 10:16:00,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 15 hours, 5 minutes, 26 seconds)
2025-09-13 10:27:36,525 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:27:36,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:28:18,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 708.98645 ± 237.110
2025-09-13 10:28:18,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [514.10864, 965.1453, 637.98553, 624.5266, 945.911, 638.7487, 517.8171, 590.94183, 435.62695, 1219.0532]
2025-09-13 10:28:18,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 178.0, 124.0, 123.0, 188.0, 120.0, 103.0, 109.0, 88.0, 238.0]
2025-09-13 10:28:18,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 14 hours, 52 minutes, 49 seconds)
2025-09-13 10:39:55,022 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:39:55,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:40:40,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 756.47668 ± 190.079
2025-09-13 10:40:40,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [698.55756, 498.0671, 768.1093, 1126.4581, 1044.5834, 755.5182, 593.85834, 565.0521, 820.7376, 693.8253]
2025-09-13 10:40:40,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 108.0, 147.0, 215.0, 203.0, 162.0, 112.0, 124.0, 156.0, 150.0]
2025-09-13 10:40:40,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (756.48) for latency ExtremeSparseL4U32
2025-09-13 10:40:40,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 14 hours, 43 minutes, 33 seconds)
2025-09-13 10:52:14,255 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 10:52:14,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 10:52:55,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 725.04358 ± 201.712
2025-09-13 10:52:55,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [794.0608, 694.3656, 980.63745, 738.07684, 1046.0999, 524.96625, 938.8124, 585.4564, 499.64362, 448.31644]
2025-09-13 10:52:55,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 129.0, 186.0, 136.0, 195.0, 97.0, 181.0, 106.0, 91.0, 84.0]
2025-09-13 10:52:55,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 14 hours, 34 minutes, 11 seconds)
2025-09-13 11:04:43,984 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:04:43,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:05:29,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 778.49963 ± 231.590
2025-09-13 11:05:29,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [526.03253, 1203.989, 623.563, 715.6254, 731.78613, 545.67566, 785.5335, 1204.2672, 824.26556, 624.25824]
2025-09-13 11:05:29,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 246.0, 117.0, 144.0, 139.0, 118.0, 147.0, 226.0, 155.0, 114.0]
2025-09-13 11:05:29,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (778.50) for latency ExtremeSparseL4U32
2025-09-13 11:05:29,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 14 hours, 25 minutes, 33 seconds)
2025-09-13 11:17:03,045 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:17:03,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:17:47,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 777.39288 ± 228.884
2025-09-13 11:17:47,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [698.1233, 1220.8795, 866.2947, 872.8064, 538.1522, 580.2654, 635.2303, 995.90955, 930.198, 436.06924]
2025-09-13 11:17:47,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 241.0, 180.0, 164.0, 96.0, 112.0, 115.0, 181.0, 177.0, 81.0]
2025-09-13 11:17:47,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 14 hours, 12 minutes, 34 seconds)
2025-09-13 11:29:26,665 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:29:26,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:30:11,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 779.41962 ± 185.729
2025-09-13 11:30:11,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [709.71405, 736.7185, 798.69275, 600.0582, 608.2997, 688.32227, 1226.6154, 613.0195, 842.19666, 970.55884]
2025-09-13 11:30:11,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 142.0, 150.0, 129.0, 111.0, 128.0, 255.0, 115.0, 157.0, 183.0]
2025-09-13 11:30:11,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (779.42) for latency ExtremeSparseL4U32
2025-09-13 11:30:11,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 14 hours, 1 minute, 41 seconds)
2025-09-13 11:41:46,619 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:41:46,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:42:35,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 844.21075 ± 256.225
2025-09-13 11:42:35,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [824.3312, 786.2798, 499.37134, 906.2734, 758.74493, 1045.201, 480.50406, 1236.3944, 1246.3984, 658.6087]
2025-09-13 11:42:35,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 149.0, 94.0, 181.0, 143.0, 196.0, 89.0, 235.0, 235.0, 123.0]
2025-09-13 11:42:35,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (844.21) for latency ExtremeSparseL4U32
2025-09-13 11:42:35,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 13 hours, 49 minutes, 34 seconds)
2025-09-13 11:54:12,643 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 11:54:12,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 11:55:11,786 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1005.85956 ± 416.283
2025-09-13 11:55:11,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1057.5531, 1163.7982, 725.1261, 1966.7894, 725.2093, 493.60098, 762.5354, 823.099, 842.45764, 1498.4269]
2025-09-13 11:55:11,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [210.0, 220.0, 138.0, 384.0, 152.0, 102.0, 157.0, 165.0, 174.0, 286.0]
2025-09-13 11:55:11,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (1005.86) for latency ExtremeSparseL4U32
2025-09-13 11:55:11,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 13 hours, 41 minutes, 57 seconds)
2025-09-13 12:06:46,413 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:06:46,430 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:07:37,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 876.55988 ± 259.085
2025-09-13 12:07:37,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1138.8727, 1152.9387, 1256.7386, 908.901, 921.95355, 502.86066, 982.2376, 494.9244, 598.77264, 807.3993]
2025-09-13 12:07:37,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [214.0, 218.0, 248.0, 172.0, 177.0, 93.0, 184.0, 94.0, 120.0, 155.0]
2025-09-13 12:07:37,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 13 hours, 27 minutes, 44 seconds)
2025-09-13 12:19:09,701 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:19:09,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:20:07,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 990.27771 ± 176.884
2025-09-13 12:20:07,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1089.8295, 979.1645, 1037.8282, 668.3197, 722.1096, 998.7936, 964.57825, 1128.4895, 1314.464, 999.1999]
2025-09-13 12:20:07,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [229.0, 214.0, 198.0, 140.0, 145.0, 189.0, 181.0, 216.0, 255.0, 185.0]
2025-09-13 12:20:07,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 13 hours, 18 minutes, 2 seconds)
2025-09-13 12:31:52,108 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:31:52,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:32:38,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 797.98596 ± 208.518
2025-09-13 12:32:38,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [747.2736, 1298.525, 562.518, 684.1237, 919.2306, 589.5706, 648.5381, 714.3667, 936.7496, 878.9632]
2025-09-13 12:32:38,235 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [160.0, 253.0, 103.0, 138.0, 172.0, 108.0, 140.0, 135.0, 175.0, 165.0]
2025-09-13 12:32:38,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 13 hours, 6 minutes, 48 seconds)
2025-09-13 12:44:15,286 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:44:15,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:45:06,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 888.48749 ± 297.227
2025-09-13 12:45:06,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [675.7955, 567.2027, 713.415, 1011.03436, 1181.4457, 711.4191, 1459.955, 659.12634, 648.8572, 1256.6243]
2025-09-13 12:45:06,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 107.0, 145.0, 188.0, 220.0, 145.0, 276.0, 122.0, 134.0, 236.0]
2025-09-13 12:45:06,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 12 hours, 55 minutes, 20 seconds)
2025-09-13 12:56:50,394 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 12:56:50,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 12:57:51,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1008.41211 ± 280.386
2025-09-13 12:57:51,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1133.6005, 781.17413, 730.6324, 1084.6288, 888.77057, 1444.8215, 703.83136, 1439.4763, 671.4056, 1205.7803]
2025-09-13 12:57:51,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [221.0, 155.0, 136.0, 221.0, 186.0, 280.0, 151.0, 291.0, 126.0, 236.0]
2025-09-13 12:57:51,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (1008.41) for latency ExtremeSparseL4U32
2025-09-13 12:57:51,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 12 hours, 44 minutes, 28 seconds)
2025-09-13 13:09:21,027 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:09:21,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:10:13,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 907.53381 ± 237.948
2025-09-13 13:10:13,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [618.57336, 660.9669, 1304.8105, 773.44836, 1071.2423, 926.17993, 878.27856, 1263.4307, 611.128, 967.2798]
2025-09-13 13:10:13,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 125.0, 249.0, 144.0, 213.0, 170.0, 166.0, 245.0, 119.0, 207.0]
2025-09-13 13:10:13,556 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 12 hours, 31 minutes, 18 seconds)
2025-09-13 13:21:58,541 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:21:58,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:23:24,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1471.84741 ± 686.664
2025-09-13 13:23:24,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [760.4899, 1182.1931, 2941.3486, 1020.9222, 2331.0906, 1111.4788, 928.98267, 2083.412, 1344.06, 1014.4957]
2025-09-13 13:23:24,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [161.0, 222.0, 603.0, 193.0, 441.0, 208.0, 188.0, 403.0, 258.0, 199.0]
2025-09-13 13:23:24,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (1471.85) for latency ExtremeSparseL4U32
2025-09-13 13:23:24,628 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 12 hours, 26 minutes, 41 seconds)
2025-09-13 13:34:50,866 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:34:50,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:35:40,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 857.39417 ± 244.911
2025-09-13 13:35:40,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [681.5155, 519.07904, 733.0283, 696.2073, 540.44446, 1128.8422, 1067.2306, 869.1312, 1159.7903, 1178.6724]
2025-09-13 13:35:40,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 98.0, 135.0, 127.0, 100.0, 216.0, 199.0, 178.0, 223.0, 243.0]
2025-09-13 13:35:41,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 12 hours, 11 minutes, 19 seconds)
2025-09-13 13:47:14,462 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 13:47:14,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 13:48:24,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1148.59326 ± 415.031
2025-09-13 13:48:24,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [790.7486, 845.0599, 641.0228, 1803.2946, 1346.2235, 812.30347, 1571.9545, 1781.4896, 909.65186, 984.1848]
2025-09-13 13:48:24,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [170.0, 160.0, 120.0, 366.0, 264.0, 147.0, 306.0, 350.0, 188.0, 190.0]
2025-09-13 13:48:24,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 12 hours, 1 minute, 33 seconds)
2025-09-13 14:00:03,819 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:00:03,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:01:12,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1179.74780 ± 404.906
2025-09-13 14:01:12,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1336.3649, 1271.15, 1811.3262, 514.0193, 1299.0063, 652.5793, 790.4737, 1358.1819, 1703.9442, 1060.4324]
2025-09-13 14:01:12,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [267.0, 241.0, 346.0, 110.0, 263.0, 124.0, 150.0, 274.0, 328.0, 200.0]
2025-09-13 14:01:12,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 11 hours, 49 minutes, 31 seconds)
2025-09-13 14:12:55,378 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:12:55,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:14:08,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1235.98718 ± 539.694
2025-09-13 14:14:08,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [545.30334, 1345.895, 1108.3967, 908.4858, 1047.3668, 679.08887, 1673.5222, 822.65875, 1959.3258, 2269.8284]
2025-09-13 14:14:08,699 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 251.0, 207.0, 171.0, 218.0, 134.0, 343.0, 154.0, 386.0, 445.0]
2025-09-13 14:14:08,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 11 hours, 43 minutes, 6 seconds)
2025-09-13 14:25:40,400 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:25:40,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:26:28,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 818.59393 ± 476.700
2025-09-13 14:26:28,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [917.7031, 516.454, 888.85205, 932.58563, 438.91702, 403.02094, 2037.6328, 497.825, 427.16214, 1125.7869]
2025-09-13 14:26:28,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [180.0, 107.0, 169.0, 173.0, 87.0, 83.0, 403.0, 97.0, 88.0, 207.0]
2025-09-13 14:26:28,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 11 hours, 21 minutes, 7 seconds)
2025-09-13 14:38:11,969 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:38:11,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:39:17,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1043.82166 ± 388.520
2025-09-13 14:39:17,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1569.2289, 833.7699, 1368.0117, 1297.433, 448.36575, 611.3638, 1336.7416, 988.1864, 554.2989, 1430.8175]
2025-09-13 14:39:17,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [320.0, 176.0, 273.0, 274.0, 91.0, 140.0, 268.0, 190.0, 111.0, 283.0]
2025-09-13 14:39:17,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 11 hours, 14 minutes, 14 seconds)
2025-09-13 14:50:49,112 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 14:50:49,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 14:52:45,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1939.12427 ± 803.779
2025-09-13 14:52:45,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2451.1208, 2609.7488, 709.9157, 3363.429, 1246.2185, 1465.5087, 2735.28, 1798.3724, 1018.21716, 1993.4324]
2025-09-13 14:52:45,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [495.0, 503.0, 132.0, 671.0, 251.0, 288.0, 514.0, 339.0, 189.0, 387.0]
2025-09-13 14:52:45,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (1939.12) for latency ExtremeSparseL4U32
2025-09-13 14:52:45,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 11 hours, 9 minutes, 12 seconds)
2025-09-13 15:04:30,175 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:04:30,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:06:18,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1888.54272 ± 886.593
2025-09-13 15:06:18,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [917.72076, 1115.7427, 2759.8906, 2367.356, 2039.6395, 3052.712, 801.6415, 3234.3315, 1661.9318, 934.46075]
2025-09-13 15:06:18,545 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [175.0, 216.0, 534.0, 447.0, 405.0, 617.0, 167.0, 601.0, 304.0, 175.0]
2025-09-13 15:06:18,557 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 11 hours, 4 minutes, 1 second)
2025-09-13 15:17:43,615 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:17:43,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:18:59,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1281.85925 ± 727.845
2025-09-13 15:18:59,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [699.66864, 1875.185, 901.5943, 1189.7649, 2657.3918, 460.32047, 1123.8657, 587.5416, 923.0485, 2400.212]
2025-09-13 15:18:59,021 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 345.0, 187.0, 225.0, 504.0, 87.0, 213.0, 107.0, 174.0, 454.0]
2025-09-13 15:18:59,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 10 hours, 48 minutes, 23 seconds)
2025-09-13 15:30:38,247 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:30:38,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:31:31,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 919.25049 ± 429.108
2025-09-13 15:31:31,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [343.74716, 792.55505, 896.08234, 411.8305, 411.35544, 1132.9736, 1090.3214, 1401.8389, 975.8011, 1735.9998]
2025-09-13 15:31:31,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 149.0, 166.0, 77.0, 81.0, 212.0, 207.0, 270.0, 186.0, 330.0]
2025-09-13 15:31:31,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 10 hours, 37 minutes, 32 seconds)
2025-09-13 15:43:25,324 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:43:25,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:45:05,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1751.17017 ± 771.186
2025-09-13 15:45:05,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1892.253, 1232.0754, 2405.4263, 3046.2407, 981.75397, 1163.4386, 992.1573, 1301.2799, 3067.6772, 1429.3982]
2025-09-13 15:45:05,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [345.0, 239.0, 462.0, 581.0, 190.0, 222.0, 210.0, 245.0, 593.0, 274.0]
2025-09-13 15:45:05,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 10 hours, 31 minutes, 42 seconds)
2025-09-13 15:56:55,506 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 15:56:55,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 15:58:25,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1530.73279 ± 739.445
2025-09-13 15:58:25,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1635.6042, 427.5634, 2448.3833, 1554.946, 2551.6636, 690.5392, 1023.051, 1461.0358, 956.1241, 2558.4165]
2025-09-13 15:58:25,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [332.0, 77.0, 489.0, 311.0, 475.0, 147.0, 188.0, 285.0, 180.0, 511.0]
2025-09-13 15:58:25,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 10 hours, 17 minutes, 13 seconds)
2025-09-13 16:09:40,724 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:09:40,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:10:39,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1000.60956 ± 461.107
2025-09-13 16:10:39,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1273.1323, 403.31613, 589.30664, 930.80963, 520.63434, 1239.3438, 772.5288, 1263.4742, 2054.601, 958.9493]
2025-09-13 16:10:39,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [242.0, 80.0, 131.0, 174.0, 107.0, 236.0, 146.0, 238.0, 396.0, 182.0]
2025-09-13 16:10:39,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 9 hours, 51 minutes, 57 seconds)
2025-09-13 16:22:30,147 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:22:30,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:24:17,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1881.93970 ± 1287.861
2025-09-13 16:24:17,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2418.309, 1319.1353, 706.0634, 695.1473, 1824.506, 5147.8794, 908.5662, 2516.678, 2367.7393, 915.3727]
2025-09-13 16:24:17,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [472.0, 269.0, 131.0, 124.0, 337.0, 984.0, 167.0, 477.0, 472.0, 178.0]
2025-09-13 16:24:17,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 9 hours, 47 minutes, 50 seconds)
2025-09-13 16:35:58,495 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:35:58,502 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:37:21,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1414.74341 ± 725.625
2025-09-13 16:37:21,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1546.2448, 1498.6465, 545.02576, 753.73303, 1057.4215, 1933.9166, 2501.8855, 2710.9485, 673.1209, 926.4904]
2025-09-13 16:37:21,461 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [304.0, 305.0, 115.0, 139.0, 209.0, 363.0, 509.0, 537.0, 141.0, 170.0]
2025-09-13 16:37:21,469 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 9 hours, 39 minutes, 15 seconds)
2025-09-13 16:49:02,786 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 16:49:02,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 16:50:46,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1762.44470 ± 1526.695
2025-09-13 16:50:46,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [882.3538, 429.1697, 3795.287, 2470.6477, 1635.9203, 1599.3279, 5169.156, 714.7041, 505.91376, 421.96762]
2025-09-13 16:50:46,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [175.0, 88.0, 737.0, 487.0, 328.0, 322.0, 1000.0, 129.0, 93.0, 84.0]
2025-09-13 16:50:46,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 9 hours, 24 minutes, 47 seconds)
2025-09-13 17:02:11,245 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:02:11,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:03:41,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1581.85852 ± 788.994
2025-09-13 17:03:41,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [657.2835, 2030.0515, 501.21396, 2003.7308, 1262.7107, 582.39026, 3136.3179, 1742.6257, 2014.053, 1888.2079]
2025-09-13 17:03:41,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 379.0, 94.0, 382.0, 254.0, 113.0, 592.0, 316.0, 369.0, 382.0]
2025-09-13 17:03:41,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 9 hours, 8 minutes, 15 seconds)
2025-09-13 17:15:19,603 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:15:19,611 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:16:49,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1540.29321 ± 898.524
2025-09-13 17:16:49,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [3452.7354, 346.32083, 2256.025, 964.4871, 1216.9597, 627.9008, 1883.509, 1832.8403, 713.6385, 2108.5154]
2025-09-13 17:16:49,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [658.0, 65.0, 442.0, 183.0, 249.0, 116.0, 389.0, 349.0, 135.0, 409.0]
2025-09-13 17:16:49,067 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 9 hours, 2 minutes, 33 seconds)
2025-09-13 17:28:19,154 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:28:19,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:30:01,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1810.33228 ± 1359.884
2025-09-13 17:30:01,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5031.298, 757.5049, 621.5121, 3227.901, 1336.0216, 438.42188, 2680.3396, 1488.1825, 1349.9558, 1172.1864]
2025-09-13 17:30:01,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [956.0, 153.0, 118.0, 621.0, 246.0, 79.0, 497.0, 280.0, 257.0, 220.0]
2025-09-13 17:30:01,962 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 8 hours, 45 minutes, 52 seconds)
2025-09-13 17:41:36,999 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:41:37,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:43:37,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2128.32349 ± 960.041
2025-09-13 17:43:37,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2163.4177, 1676.9286, 2308.9585, 734.9236, 2785.6426, 2194.9768, 1240.8171, 4291.716, 1213.3997, 2672.454]
2025-09-13 17:43:37,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [411.0, 315.0, 439.0, 131.0, 531.0, 415.0, 261.0, 794.0, 235.0, 496.0]
2025-09-13 17:43:37,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (2128.32) for latency ExtremeSparseL4U32
2025-09-13 17:43:37,779 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 8 hours, 36 minutes, 55 seconds)
2025-09-13 17:55:18,951 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 17:55:18,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 17:57:27,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2168.21338 ± 1210.612
2025-09-13 17:57:27,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1687.4006, 1501.287, 1455.1808, 4331.8213, 1343.2773, 2580.6025, 1276.0659, 1716.8645, 1179.5773, 4610.0566]
2025-09-13 17:57:27,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [325.0, 282.0, 281.0, 828.0, 250.0, 508.0, 235.0, 361.0, 229.0, 897.0]
2025-09-13 17:57:27,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (2168.21) for latency ExtremeSparseL4U32
2025-09-13 17:57:27,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 8 hours, 26 minutes, 48 seconds)
2025-09-13 18:09:15,295 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:09:15,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:11:13,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2093.03735 ± 1432.941
2025-09-13 18:11:13,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [3143.4873, 665.0955, 611.8571, 1754.0078, 2295.7634, 4943.5254, 3486.4119, 981.08264, 281.3365, 2767.806]
2025-09-13 18:11:13,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [582.0, 145.0, 122.0, 324.0, 454.0, 909.0, 645.0, 184.0, 55.0, 529.0]
2025-09-13 18:11:13,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 8 hours, 19 minutes, 50 seconds)
2025-09-13 18:22:43,987 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:22:43,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:24:42,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2068.21729 ± 1395.754
2025-09-13 18:24:42,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1376.0681, 1715.4375, 5160.6274, 1280.8452, 2554.7812, 3289.9749, 534.95905, 608.8529, 965.7579, 3194.869]
2025-09-13 18:24:42,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [260.0, 329.0, 974.0, 241.0, 482.0, 624.0, 98.0, 112.0, 182.0, 604.0]
2025-09-13 18:24:42,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 8 hours, 8 minutes, 49 seconds)
2025-09-13 18:36:32,242 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:36:32,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:37:45,606 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1234.96008 ± 475.573
2025-09-13 18:37:45,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1737.8245, 518.2294, 1278.4697, 924.329, 2345.7275, 888.30853, 1151.0383, 1285.7107, 1079.0931, 1140.8701]
2025-09-13 18:37:45,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [344.0, 95.0, 243.0, 170.0, 470.0, 174.0, 218.0, 256.0, 234.0, 213.0]
2025-09-13 18:37:45,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 7 hours, 54 minutes, 5 seconds)
2025-09-13 18:49:53,962 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 18:49:53,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 18:51:07,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1264.64856 ± 811.105
2025-09-13 18:51:07,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [659.40546, 674.3742, 3010.0217, 325.6265, 468.365, 2138.9077, 1496.0715, 1416.4088, 1707.4564, 749.84894]
2025-09-13 18:51:07,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 130.0, 589.0, 66.0, 86.0, 408.0, 297.0, 275.0, 332.0, 152.0]
2025-09-13 18:51:07,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 38 minutes, 57 seconds)
2025-09-13 19:02:07,651 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:02:07,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:03:49,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1720.32849 ± 1079.826
2025-09-13 19:03:49,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2383.247, 1477.2188, 3333.267, 461.91296, 1507.2463, 1587.045, 1084.6888, 877.3583, 625.96387, 3865.3362]
2025-09-13 19:03:49,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [440.0, 281.0, 643.0, 89.0, 293.0, 312.0, 212.0, 162.0, 115.0, 754.0]
2025-09-13 19:03:49,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 7 hours, 18 minutes, 1 second)
2025-09-13 19:15:28,626 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:15:28,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:16:07,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 634.83783 ± 178.478
2025-09-13 19:16:07,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [418.4568, 941.08075, 600.84503, 412.66226, 780.5293, 664.8759, 868.1577, 684.3426, 542.0898, 435.3383]
2025-09-13 19:16:07,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 189.0, 124.0, 84.0, 161.0, 136.0, 185.0, 127.0, 108.0, 87.0]
2025-09-13 19:16:07,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 6 hours, 55 minutes, 20 seconds)
2025-09-13 19:27:49,463 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:27:49,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:28:51,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1077.76501 ± 436.840
2025-09-13 19:28:51,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1368.9678, 1125.4084, 661.0921, 886.2056, 544.41376, 726.02606, 1909.9415, 1532.3855, 627.6956, 1395.5133]
2025-09-13 19:28:51,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [254.0, 216.0, 127.0, 160.0, 99.0, 150.0, 388.0, 291.0, 121.0, 262.0]
2025-09-13 19:28:51,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 6 hours, 37 minutes, 44 seconds)
2025-09-13 19:40:54,192 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:40:54,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:42:51,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2062.37451 ± 780.656
2025-09-13 19:42:51,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1469.9286, 585.20874, 2470.5867, 2505.6453, 1357.1433, 3207.74, 2782.6382, 2774.9395, 2006.9518, 1462.9645]
2025-09-13 19:42:51,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [283.0, 108.0, 467.0, 491.0, 263.0, 608.0, 518.0, 533.0, 363.0, 293.0]
2025-09-13 19:42:51,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 30 minutes, 36 seconds)
2025-09-13 19:54:07,980 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 19:54:07,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 19:56:24,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2327.46899 ± 1054.534
2025-09-13 19:56:24,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1001.3404, 3584.7065, 2960.2163, 2010.2887, 3147.7593, 942.4423, 3639.9507, 1905.5784, 895.6649, 3186.744]
2025-09-13 19:56:24,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [195.0, 702.0, 599.0, 390.0, 613.0, 172.0, 698.0, 358.0, 184.0, 603.0]
2025-09-13 19:56:24,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (2327.47) for latency ExtremeSparseL4U32
2025-09-13 19:56:24,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 18 minutes, 39 seconds)
2025-09-13 20:08:14,513 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:08:14,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:10:34,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2374.21436 ± 1419.261
2025-09-13 20:10:34,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1334.0873, 5173.4585, 2719.0054, 773.11084, 1163.0695, 921.7459, 4488.5327, 2461.3928, 1860.0825, 2847.6582]
2025-09-13 20:10:34,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [259.0, 1000.0, 519.0, 143.0, 217.0, 181.0, 906.0, 472.0, 355.0, 536.0]
2025-09-13 20:10:34,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (2374.21) for latency ExtremeSparseL4U32
2025-09-13 20:10:34,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 13 minutes, 50 seconds)
2025-09-13 20:22:15,122 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:22:15,131 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:24:50,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2719.23120 ± 1681.957
2025-09-13 20:24:50,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5411.347, 474.58945, 4721.102, 1013.52094, 4190.14, 4416.7812, 1673.4175, 2042.5537, 1408.9111, 1839.949]
2025-09-13 20:24:50,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 88.0, 870.0, 186.0, 772.0, 818.0, 327.0, 390.0, 261.0, 343.0]
2025-09-13 20:24:50,009 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (2719.23) for latency ExtremeSparseL4U32
2025-09-13 20:24:50,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 11 minutes)
2025-09-13 20:36:24,218 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:36:24,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:38:47,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2413.52124 ± 1901.024
2025-09-13 20:38:47,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5158.594, 1574.0293, 2305.9912, 773.25836, 5069.1187, 3526.5896, 382.0742, 443.91, 341.71436, 4559.931]
2025-09-13 20:38:47,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 308.0, 449.0, 155.0, 1000.0, 675.0, 80.0, 80.0, 68.0, 892.0]
2025-09-13 20:38:47,714 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 3 minutes, 39 seconds)
2025-09-13 20:50:41,590 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 20:50:41,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 20:52:00,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1385.81677 ± 850.200
2025-09-13 20:52:00,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2633.5378, 537.31006, 841.5108, 653.4994, 1200.8746, 957.4171, 668.4105, 2657.4922, 2669.8452, 1038.2693]
2025-09-13 20:52:00,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [486.0, 117.0, 156.0, 140.0, 227.0, 173.0, 128.0, 488.0, 500.0, 195.0]
2025-09-13 20:52:00,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 45 minutes, 44 seconds)
2025-09-13 21:03:36,459 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:03:36,467 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:06:40,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3038.67627 ± 1732.702
2025-09-13 21:06:40,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5236.36, 1842.3275, 2083.2837, 2482.2039, 596.89923, 5155.423, 3951.684, 424.63824, 3528.4316, 5085.51]
2025-09-13 21:06:40,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 377.0, 417.0, 472.0, 127.0, 1000.0, 793.0, 84.0, 704.0, 1000.0]
2025-09-13 21:06:40,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (3038.68) for latency ExtremeSparseL4U32
2025-09-13 21:06:40,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 37 minutes, 17 seconds)
2025-09-13 21:18:08,013 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:18:08,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:21:14,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3160.15527 ± 1522.077
2025-09-13 21:21:14,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2772.6423, 2302.838, 4229.658, 5125.499, 4574.282, 3495.1733, 2207.3335, 1160.1741, 5112.4062, 621.5453]
2025-09-13 21:21:14,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [547.0, 437.0, 822.0, 1000.0, 880.0, 672.0, 432.0, 216.0, 1000.0, 131.0]
2025-09-13 21:21:14,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (3160.16) for latency ExtremeSparseL4U32
2025-09-13 21:21:14,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 25 minutes, 2 seconds)
2025-09-13 21:33:12,634 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:33:12,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:35:45,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2654.94897 ± 1225.615
2025-09-13 21:35:45,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2538.467, 2283.7507, 2793.3972, 1308.4717, 3900.9429, 794.1547, 3072.8855, 1805.1493, 2728.907, 5323.365]
2025-09-13 21:35:45,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [482.0, 425.0, 518.0, 271.0, 735.0, 152.0, 570.0, 343.0, 510.0, 1000.0]
2025-09-13 21:35:45,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 12 minutes, 6 seconds)
2025-09-13 21:47:28,571 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 21:47:28,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 21:49:58,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2632.67041 ± 1541.766
2025-09-13 21:49:58,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [483.42224, 2738.3953, 5280.34, 3415.9539, 1608.9752, 2709.5354, 1467.2834, 1081.8181, 2336.9832, 5203.997]
2025-09-13 21:49:58,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 511.0, 1000.0, 642.0, 310.0, 502.0, 269.0, 204.0, 424.0, 1000.0]
2025-09-13 21:49:59,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 58 minutes, 59 seconds)
2025-09-13 22:02:00,083 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:02:00,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:04:31,096 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2518.61987 ± 1458.221
2025-09-13 22:04:31,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [3051.444, 1416.1155, 2920.622, 3202.2903, 779.9804, 5062.317, 827.31683, 4761.045, 1561.2002, 1603.8682]
2025-09-13 22:04:31,100 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [613.0, 283.0, 573.0, 605.0, 144.0, 1000.0, 163.0, 940.0, 308.0, 306.0]
2025-09-13 22:04:31,113 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 50 minutes, 1 second)
2025-09-13 22:16:01,592 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:16:01,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:19:25,384 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3410.99561 ± 1781.775
2025-09-13 22:19:25,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5147.86, 4171.698, 5297.668, 507.7483, 911.9224, 1800.4532, 5276.607, 3090.5317, 2696.9258, 5208.544]
2025-09-13 22:19:25,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 800.0, 1000.0, 95.0, 174.0, 358.0, 1000.0, 600.0, 533.0, 1000.0]
2025-09-13 22:19:25,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (3411.00) for latency ExtremeSparseL4U32
2025-09-13 22:19:25,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 36 minutes, 25 seconds)
2025-09-13 22:31:01,285 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:31:01,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:34:01,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3131.54346 ± 1245.888
2025-09-13 22:34:01,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2515.7537, 2695.1099, 1727.1931, 1072.6254, 2578.5427, 4477.1113, 5363.941, 3484.8486, 3040.6323, 4359.677]
2025-09-13 22:34:01,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [478.0, 513.0, 328.0, 202.0, 491.0, 825.0, 1000.0, 649.0, 569.0, 864.0]
2025-09-13 22:34:01,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 22 minutes, 3 seconds)
2025-09-13 22:45:13,347 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 22:45:13,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 22:48:24,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3328.32568 ± 1663.109
2025-09-13 22:48:24,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5265.134, 2849.9717, 1089.1433, 5272.038, 639.1046, 1849.6471, 3444.102, 4568.792, 3021.804, 5283.5205]
2025-09-13 22:48:24,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 532.0, 228.0, 1000.0, 118.0, 353.0, 659.0, 892.0, 599.0, 1000.0]
2025-09-13 22:48:24,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 6 minutes, 58 seconds)
2025-09-13 23:00:08,713 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:00:08,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:02:58,047 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2846.42993 ± 1649.598
2025-09-13 23:02:58,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5253.4443, 2493.1438, 1991.2189, 4957.564, 5234.378, 1775.768, 545.6003, 1578.7643, 3264.6704, 1369.7452]
2025-09-13 23:02:58,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 493.0, 378.0, 952.0, 1000.0, 341.0, 119.0, 303.0, 641.0, 252.0]
2025-09-13 23:02:58,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 53 minutes, 32 seconds)
2025-09-13 23:14:32,517 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:14:32,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:18:12,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3877.76685 ± 1776.037
2025-09-13 23:18:12,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [4563.3237, 5228.229, 1402.7605, 4366.296, 1400.3733, 5317.448, 5225.2007, 5186.2817, 5252.533, 835.22205]
2025-09-13 23:18:12,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [862.0, 1000.0, 267.0, 815.0, 251.0, 1000.0, 959.0, 1000.0, 1000.0, 156.0]
2025-09-13 23:18:12,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1226 [INFO]: New best (3877.77) for latency ExtremeSparseL4U32
2025-09-13 23:18:12,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 41 minutes, 3 seconds)
2025-09-13 23:30:02,026 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:30:02,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:32:32,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2651.31787 ± 1828.378
2025-09-13 23:32:32,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1198.1923, 5302.701, 459.26852, 916.5309, 4550.5537, 3939.7793, 932.82275, 1587.7739, 2325.4749, 5300.081]
2025-09-13 23:32:32,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [240.0, 1000.0, 80.0, 186.0, 854.0, 731.0, 186.0, 308.0, 448.0, 1000.0]
2025-09-13 23:32:32,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 24 minutes, 44 seconds)
2025-09-13 23:44:07,339 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:44:07,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-13 23:46:06,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2083.42261 ± 1406.406
2025-09-13 23:46:06,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [472.45428, 1190.6473, 1582.1283, 467.74716, 2333.629, 5298.5747, 1361.9978, 1838.9882, 3528.7493, 2759.3098]
2025-09-13 23:46:06,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 227.0, 298.0, 85.0, 451.0, 1000.0, 263.0, 351.0, 663.0, 528.0]
2025-09-13 23:46:06,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 7 minutes, 23 seconds)
2025-09-13 23:58:10,389 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-13 23:58:10,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:01:12,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3159.89209 ± 1579.184
2025-09-14 00:01:12,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2570.5896, 5396.5337, 4113.554, 2841.7, 5254.4224, 2261.6824, 1663.2401, 4957.871, 1888.2175, 651.11005]
2025-09-14 00:01:12,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [493.0, 1000.0, 770.0, 529.0, 1000.0, 424.0, 308.0, 937.0, 353.0, 124.0]
2025-09-14 00:01:12,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 54 minutes, 43 seconds)
2025-09-14 00:13:00,687 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:13:00,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:14:46,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1795.75854 ± 1176.343
2025-09-14 00:14:46,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5137.968, 1703.4885, 1814.6108, 1389.1642, 1077.3535, 1431.0258, 521.9734, 1768.8176, 1325.8387, 1787.3475]
2025-09-14 00:14:46,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 344.0, 360.0, 258.0, 217.0, 271.0, 109.0, 346.0, 257.0, 331.0]
2025-09-14 00:14:46,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 37 minutes, 59 seconds)
2025-09-14 00:25:52,685 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:25:52,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:27:58,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2186.59058 ± 1639.671
2025-09-14 00:27:58,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5251.383, 665.8958, 1397.8171, 1938.7576, 5130.0327, 1530.8352, 526.9265, 746.3029, 2852.2615, 1825.6925]
2025-09-14 00:27:58,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 133.0, 258.0, 369.0, 964.0, 295.0, 99.0, 147.0, 546.0, 351.0]
2025-09-14 00:27:58,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 19 minutes, 31 seconds)
2025-09-14 00:39:54,702 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:39:54,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:41:59,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2192.16260 ± 1620.896
2025-09-14 00:41:59,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [2839.6409, 2927.668, 1153.4087, 610.51324, 4237.276, 1090.3207, 599.23065, 315.5232, 2810.3618, 5337.6826]
2025-09-14 00:41:59,107 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [538.0, 546.0, 217.0, 127.0, 805.0, 208.0, 131.0, 60.0, 531.0, 1000.0]
2025-09-14 00:41:59,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 4 minutes, 59 seconds)
2025-09-14 00:53:48,074 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 00:53:48,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 00:55:26,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1747.06287 ± 1259.986
2025-09-14 00:55:26,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1733.6116, 647.5934, 2598.2307, 771.501, 498.0204, 4913.687, 2434.9556, 1064.8322, 1742.4268, 1065.7688]
2025-09-14 00:55:26,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [312.0, 118.0, 483.0, 146.0, 91.0, 934.0, 466.0, 201.0, 326.0, 204.0]
2025-09-14 00:55:26,437 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 50 minutes, 56 seconds)
2025-09-14 01:06:41,407 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:06:41,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:09:19,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2817.05249 ± 1789.294
2025-09-14 01:09:19,367 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [4820.0015, 1052.0804, 550.32275, 1514.0542, 1903.6217, 4982.8315, 1121.9686, 4392.011, 5458.7637, 2374.869]
2025-09-14 01:09:19,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [908.0, 189.0, 101.0, 320.0, 357.0, 921.0, 211.0, 821.0, 1000.0, 447.0]
2025-09-14 01:09:19,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 35 minutes, 21 seconds)
2025-09-14 01:21:43,053 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:21:43,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:24:35,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3051.22607 ± 1708.321
2025-09-14 01:24:35,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5279.244, 719.1755, 5286.1587, 816.4217, 1718.2114, 4013.1218, 3387.626, 1963.9805, 5085.951, 2242.3708]
2025-09-14 01:24:35,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 147.0, 1000.0, 156.0, 323.0, 750.0, 636.0, 361.0, 1000.0, 413.0]
2025-09-14 01:24:35,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 23 minutes, 46 seconds)
2025-09-14 01:35:31,526 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:35:31,535 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:38:27,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3090.99072 ± 1938.398
2025-09-14 01:38:27,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5300.6323, 4778.1147, 493.84103, 5275.891, 665.49976, 1989.1034, 3752.0386, 603.7344, 2829.9707, 5221.0806]
2025-09-14 01:38:27,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 895.0, 92.0, 1000.0, 120.0, 393.0, 703.0, 110.0, 536.0, 1000.0]
2025-09-14 01:38:27,285 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 10 minutes, 29 seconds)
2025-09-14 01:50:03,285 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 01:50:03,292 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 01:53:31,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3554.89966 ± 1729.401
2025-09-14 01:53:31,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5127.123, 5091.0605, 2108.8115, 1546.6497, 1779.0752, 5123.07, 5101.29, 5117.0854, 3896.6716, 658.1585]
2025-09-14 01:53:31,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 415.0, 297.0, 346.0, 1000.0, 1000.0, 1000.0, 793.0, 120.0]
2025-09-14 01:53:31,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 57 minutes, 14 seconds)
2025-09-14 02:05:24,096 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:05:24,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:09:01,474 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3737.66992 ± 1640.401
2025-09-14 02:09:01,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [5089.492, 694.96875, 2371.9111, 3773.3186, 5225.081, 5345.872, 2153.1257, 5230.9253, 5232.2227, 2259.7805]
2025-09-14 02:09:01,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [985.0, 134.0, 442.0, 715.0, 1000.0, 1000.0, 413.0, 1000.0, 1000.0, 425.0]
2025-09-14 02:09:01,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 44 minutes, 9 seconds)
2025-09-14 02:21:06,658 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:21:06,666 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:23:33,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 2568.06299 ± 1873.065
2025-09-14 02:23:33,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [921.4799, 5229.5137, 2276.7307, 5235.667, 2074.1365, 4063.0457, 586.14886, 479.1896, 501.28928, 4313.4307]
2025-09-14 02:23:33,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [178.0, 1000.0, 438.0, 1000.0, 400.0, 779.0, 107.0, 87.0, 93.0, 806.0]
2025-09-14 02:23:33,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 29 minutes, 41 seconds)
2025-09-14 02:35:07,789 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:35:07,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:38:19,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 3278.00317 ± 2098.312
2025-09-14 02:38:19,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1567.8318, 4536.875, 5190.0566, 375.9404, 391.39774, 4960.2896, 4844.4033, 5129.5303, 5138.9824, 644.7235]
2025-09-14 02:38:19,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [300.0, 884.0, 1000.0, 70.0, 70.0, 971.0, 936.0, 1000.0, 1000.0, 129.0]
2025-09-14 02:38:19,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 14 minutes, 44 seconds)
2025-09-14 02:49:45,249 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-14 02:49:45,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-09-14 02:51:29,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1221 [DEBUG]: Total Reward: 1773.18164 ± 1194.826
2025-09-14 02:51:29,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1222 [DEBUG]: All rewards: [1051.7864, 2819.3875, 1158.6143, 1173.8043, 978.29425, 915.0209, 1264.663, 2133.3909, 1308.9828, 4927.8735]
2025-09-14 02:51:29,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1223 [DEBUG]: All trajectory lengths: [210.0, 520.0, 214.0, 224.0, 205.0, 187.0, 235.0, 432.0, 253.0, 938.0]
2025-09-14 02:51:29,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-humanoid):1251 [DEBUG]: Training session finished
