2025-05-09 09:43:49,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-05-09 09:43:49,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay
2025-05-09 09:43:49,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x14bcbc30d490>}
2025-05-09 09:43:49,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1111 [DEBUG]: using device: cuda
2025-05-09 09:43:49,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1133 [INFO]: Creating new trainer
2025-05-09 09:43:49,588 baseline-mbpac-noisy-humanoid:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-05-09 09:43:49,588 baseline-mbpac-noisy-humanoid:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-09 09:43:49,599 baseline-mbpac-noisy-humanoid:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=512, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=376, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=376, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=512, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 512, batch_first=True)
)
2025-05-09 09:43:50,472 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1194 [DEBUG]: Starting training session...
2025-05-09 09:43:50,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 1/100
2025-05-09 09:53:08,293 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 09:53:08,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 09:53:19,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 252.93277 ± 18.693
2025-05-09 09:53:19,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [251.50555, 275.93927, 287.25156, 264.00806, 247.50742, 229.94882, 230.02782, 252.27351, 230.14674, 260.7188]
2025-05-09 09:53:19,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [46.0, 50.0, 52.0, 48.0, 46.0, 42.0, 42.0, 46.0, 42.0, 48.0]
2025-05-09 09:53:19,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (252.93) for latency MM1Queue_a033_s075
2025-05-09 09:53:19,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 09:53:19,191 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 09:53:19,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 15 hours, 38 minutes, 24 seconds)
2025-05-09 10:03:35,127 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:03:35,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:03:51,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 351.99738 ± 73.203
2025-05-09 10:03:51,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [337.51288, 354.4135, 361.7673, 387.85794, 352.98148, 273.07007, 257.497, 358.46982, 299.83566, 536.56836]
2025-05-09 10:03:51,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 68.0, 77.0, 71.0, 72.0, 56.0, 50.0, 77.0, 65.0, 103.0]
2025-05-09 10:03:51,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (352.00) for latency MM1Queue_a033_s075
2025-05-09 10:03:51,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 10:03:51,878 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 10:03:52,335 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 16 hours, 21 minutes, 31 seconds)
2025-05-09 10:13:58,971 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:13:58,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:14:14,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 344.89301 ± 38.995
2025-05-09 10:14:14,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [326.69977, 336.79684, 288.79327, 404.88278, 378.76617, 284.71597, 401.43076, 340.29922, 337.25906, 349.28613]
2025-05-09 10:14:14,179 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 68.0, 53.0, 75.0, 71.0, 52.0, 74.0, 62.0, 61.0, 64.0]
2025-05-09 10:14:14,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 16 hours, 22 minutes, 46 seconds)
2025-05-09 10:24:21,977 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:24:21,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:24:39,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 408.71506 ± 71.901
2025-05-09 10:24:39,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [361.53003, 351.1599, 335.39923, 477.76758, 361.10617, 363.34534, 513.7089, 546.5045, 367.72458, 408.90463]
2025-05-09 10:24:39,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 64.0, 61.0, 89.0, 67.0, 67.0, 95.0, 103.0, 68.0, 75.0]
2025-05-09 10:24:39,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (408.72) for latency MM1Queue_a033_s075
2025-05-09 10:24:39,741 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 10:24:39,749 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 10:24:39,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 16 hours, 19 minutes, 43 seconds)
2025-05-09 10:34:47,268 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:34:47,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:35:07,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 440.33148 ± 45.758
2025-05-09 10:35:07,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [450.24762, 397.0478, 386.2347, 481.8043, 425.77957, 394.56345, 401.7042, 532.7036, 449.19742, 484.0322]
2025-05-09 10:35:07,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 74.0, 71.0, 93.0, 86.0, 73.0, 73.0, 102.0, 87.0, 92.0]
2025-05-09 10:35:07,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (440.33) for latency MM1Queue_a033_s075
2025-05-09 10:35:07,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 10:35:07,278 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 10:35:07,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 16 hours, 14 minutes, 19 seconds)
2025-05-09 10:45:14,633 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:45:14,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:45:31,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 403.03256 ± 56.024
2025-05-09 10:45:31,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [450.29782, 399.8173, 478.8832, 356.73166, 374.52133, 316.10257, 415.82983, 329.07355, 423.40817, 485.66006]
2025-05-09 10:45:31,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 73.0, 90.0, 79.0, 68.0, 58.0, 76.0, 60.0, 76.0, 90.0]
2025-05-09 10:45:32,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 16 hours, 21 minutes, 44 seconds)
2025-05-09 10:55:40,475 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 10:55:40,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 10:56:03,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 500.52588 ± 103.426
2025-05-09 10:56:03,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [434.3388, 330.7859, 742.3189, 563.34753, 421.81094, 506.65356, 448.32687, 545.69885, 493.58807, 518.38916]
2025-05-09 10:56:03,405 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 62.0, 141.0, 121.0, 75.0, 90.0, 99.0, 102.0, 99.0, 95.0]
2025-05-09 10:56:03,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (500.53) for latency MM1Queue_a033_s075
2025-05-09 10:56:03,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 10:56:03,414 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 10:56:03,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 16 hours, 10 minutes, 38 seconds)
2025-05-09 11:06:11,599 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:06:11,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:06:32,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 467.63696 ± 120.280
2025-05-09 11:06:32,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [598.27515, 427.36792, 373.79614, 502.41956, 744.8907, 335.7285, 410.46564, 509.98914, 428.79413, 344.64307]
2025-05-09 11:06:32,732 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 80.0, 76.0, 99.0, 139.0, 72.0, 73.0, 95.0, 92.0, 66.0]
2025-05-09 11:06:32,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 16 hours, 2 minutes, 29 seconds)
2025-05-09 11:16:45,288 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:16:45,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:17:07,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 508.52594 ± 121.271
2025-05-09 11:17:07,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [404.0728, 546.179, 453.92172, 486.0189, 707.68805, 352.70285, 382.10007, 599.84326, 446.4088, 706.3237]
2025-05-09 11:17:07,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 103.0, 85.0, 87.0, 133.0, 65.0, 82.0, 109.0, 83.0, 128.0]
2025-05-09 11:17:07,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (508.53) for latency MM1Queue_a033_s075
2025-05-09 11:17:07,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 11:17:07,921 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 11:17:08,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 15 hours, 55 minutes, 4 seconds)
2025-05-09 11:27:11,867 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:27:11,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:27:35,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 506.62939 ± 124.122
2025-05-09 11:27:35,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [377.76465, 419.63745, 733.7814, 382.51035, 519.1272, 424.60718, 579.2116, 517.02344, 405.17352, 707.45685]
2025-05-09 11:27:35,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 84.0, 146.0, 72.0, 97.0, 86.0, 122.0, 98.0, 75.0, 132.0]
2025-05-09 11:27:35,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 15 hours, 44 minutes, 24 seconds)
2025-05-09 11:37:45,088 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:37:45,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:38:09,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 542.90973 ± 126.034
2025-05-09 11:38:09,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [402.7044, 508.87366, 385.51138, 383.23883, 604.67426, 694.1745, 754.42816, 613.6595, 465.7141, 616.11884]
2025-05-09 11:38:09,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 91.0, 72.0, 71.0, 133.0, 127.0, 141.0, 112.0, 87.0, 115.0]
2025-05-09 11:38:09,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (542.91) for latency MM1Queue_a033_s075
2025-05-09 11:38:09,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 11:38:09,092 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 11:38:09,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 15 hours, 36 minutes, 39 seconds)
2025-05-09 11:48:15,775 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:48:15,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:48:38,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 490.35590 ± 75.170
2025-05-09 11:48:38,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [461.10864, 506.91806, 683.80664, 448.88782, 470.97855, 542.08093, 509.57874, 447.46487, 419.8245, 412.90997]
2025-05-09 11:48:38,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 106.0, 129.0, 96.0, 99.0, 104.0, 103.0, 90.0, 80.0, 79.0]
2025-05-09 11:48:38,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 15 hours, 25 minutes, 33 seconds)
2025-05-09 11:58:45,607 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 11:58:45,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 11:59:10,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 569.99329 ± 114.335
2025-05-09 11:59:10,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [595.70825, 696.953, 649.56616, 709.4521, 451.41962, 398.90335, 419.6176, 708.88617, 519.8978, 549.5288]
2025-05-09 11:59:10,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 130.0, 121.0, 130.0, 93.0, 73.0, 81.0, 132.0, 99.0, 98.0]
2025-05-09 11:59:10,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (569.99) for latency MM1Queue_a033_s075
2025-05-09 11:59:10,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 11:59:10,910 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 11:59:11,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 15 hours, 15 minutes, 54 seconds)
2025-05-09 12:09:22,696 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:09:22,698 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:09:48,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 570.57092 ± 100.594
2025-05-09 12:09:48,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [702.04504, 535.7738, 582.89905, 444.40814, 738.0331, 436.66348, 647.2553, 519.2454, 474.64197, 624.744]
2025-05-09 12:09:48,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 108.0, 121.0, 95.0, 139.0, 91.0, 130.0, 98.0, 93.0, 126.0]
2025-05-09 12:09:48,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (570.57) for latency MM1Queue_a033_s075
2025-05-09 12:09:48,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 12:09:48,881 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 12:09:48,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 15 hours, 6 minutes, 1 second)
2025-05-09 12:20:01,487 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:20:01,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:20:31,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 688.52100 ± 128.347
2025-05-09 12:20:31,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [706.38495, 696.99646, 804.11224, 472.40802, 904.71185, 636.8361, 704.3341, 829.89044, 520.0638, 609.47205]
2025-05-09 12:20:31,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 127.0, 148.0, 86.0, 184.0, 115.0, 130.0, 152.0, 103.0, 112.0]
2025-05-09 12:20:31,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (688.52) for latency MM1Queue_a033_s075
2025-05-09 12:20:31,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 12:20:31,570 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 12:20:31,588 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 14 hours, 59 minutes, 56 seconds)
2025-05-09 12:30:34,396 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:30:34,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:31:03,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 649.61389 ± 194.854
2025-05-09 12:31:03,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [499.58856, 567.6428, 519.789, 451.9311, 971.5107, 950.40027, 497.18018, 587.51294, 549.38983, 901.19336]
2025-05-09 12:31:03,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 118.0, 98.0, 83.0, 185.0, 180.0, 92.0, 123.0, 111.0, 172.0]
2025-05-09 12:31:03,721 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 14 hours, 48 minutes, 43 seconds)
2025-05-09 12:41:15,180 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:41:15,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:41:40,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 586.96008 ± 140.169
2025-05-09 12:41:40,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [739.69916, 585.61066, 558.1301, 549.39844, 426.76443, 496.85938, 613.75934, 564.2317, 915.4521, 419.69528]
2025-05-09 12:41:40,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 109.0, 106.0, 100.0, 80.0, 94.0, 125.0, 104.0, 173.0, 77.0]
2025-05-09 12:41:40,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 14 hours, 40 minutes, 24 seconds)
2025-05-09 12:51:55,578 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 12:51:55,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 12:52:27,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 688.14490 ± 174.799
2025-05-09 12:52:27,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [974.8992, 607.94556, 642.51373, 935.5469, 514.3955, 780.83844, 754.29456, 754.3605, 443.76898, 472.88644]
2025-05-09 12:52:27,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [212.0, 127.0, 119.0, 192.0, 109.0, 151.0, 157.0, 135.0, 81.0, 86.0]
2025-05-09 12:52:27,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 14 hours, 33 minutes, 48 seconds)
2025-05-09 13:02:26,516 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 13:02:26,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:02:59,449 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 732.73090 ± 161.454
2025-05-09 13:02:59,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [708.5458, 688.97845, 928.8757, 599.86755, 627.2783, 1100.2084, 598.633, 836.969, 643.0755, 594.87714]
2025-05-09 13:02:59,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 129.0, 169.0, 108.0, 137.0, 225.0, 129.0, 157.0, 119.0, 111.0]
2025-05-09 13:02:59,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (732.73) for latency MM1Queue_a033_s075
2025-05-09 13:02:59,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 13:02:59,507 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 13:02:59,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 14 hours, 21 minutes, 27 seconds)
2025-05-09 13:13:05,585 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 13:13:06,004 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:13:43,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 824.69275 ± 315.792
2025-05-09 13:13:43,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [490.78754, 1512.1565, 1231.0156, 647.528, 494.0246, 518.0484, 853.4869, 829.634, 913.286, 756.95953]
2025-05-09 13:13:43,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 311.0, 234.0, 128.0, 93.0, 94.0, 162.0, 157.0, 188.0, 148.0]
2025-05-09 13:13:43,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (824.69) for latency MM1Queue_a033_s075
2025-05-09 13:13:43,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 13:13:43,568 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 13:13:43,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 14 hours, 11 minutes, 12 seconds)
2025-05-09 13:23:54,275 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 13:23:54,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:24:33,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 851.44763 ± 288.956
2025-05-09 13:24:33,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [825.8875, 612.8686, 445.76138, 911.8586, 1185.5253, 693.3673, 1382.278, 1068.6298, 911.0807, 477.21973]
2025-05-09 13:24:33,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [157.0, 130.0, 93.0, 167.0, 237.0, 142.0, 267.0, 219.0, 172.0, 99.0]
2025-05-09 13:24:33,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (851.45) for latency MM1Queue_a033_s075
2025-05-09 13:24:33,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 13:24:33,808 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 13:24:33,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 14 hours, 5 minutes, 19 seconds)
2025-05-09 13:34:40,360 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 13:34:40,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:35:20,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 902.87781 ± 382.849
2025-05-09 13:35:20,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [1119.9971, 1178.5465, 1367.0927, 1654.0239, 523.6516, 578.05133, 732.06683, 529.983, 812.94934, 532.4159]
2025-05-09 13:35:20,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [215.0, 208.0, 286.0, 313.0, 103.0, 112.0, 134.0, 100.0, 150.0, 96.0]
2025-05-09 13:35:20,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (902.88) for latency MM1Queue_a033_s075
2025-05-09 13:35:20,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 13:35:20,503 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 13:35:20,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 13 hours, 57 minutes, 5 seconds)
2025-05-09 13:45:35,473 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 13:45:35,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:46:21,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 1007.74335 ± 254.640
2025-05-09 13:46:21,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [797.07434, 1151.2891, 1138.5845, 612.89014, 959.7437, 1087.249, 661.3469, 1526.7218, 1118.0488, 1024.4858]
2025-05-09 13:46:21,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [156.0, 222.0, 229.0, 133.0, 185.0, 210.0, 120.0, 279.0, 227.0, 200.0]
2025-05-09 13:46:21,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (1007.74) for latency MM1Queue_a033_s075
2025-05-09 13:46:21,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 13:46:21,665 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 13:46:21,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 13 hours, 50 minutes)
2025-05-09 13:56:32,327 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 13:56:32,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 13:57:13,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 903.39471 ± 257.557
2025-05-09 13:57:13,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [1088.6863, 1330.478, 1058.2067, 1001.4418, 489.15158, 600.66235, 881.9777, 1139.5417, 604.2233, 839.5783]
2025-05-09 13:57:13,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [204.0, 258.0, 212.0, 193.0, 91.0, 126.0, 187.0, 240.0, 131.0, 165.0]
2025-05-09 13:57:13,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 13 hours, 44 minutes, 26 seconds)
2025-05-09 14:07:37,961 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 14:07:37,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:08:30,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 1185.29077 ± 699.993
2025-05-09 14:08:30,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [2973.448, 1616.8712, 415.57675, 1167.8462, 1578.5912, 883.8755, 1067.7477, 767.3114, 703.275, 678.3639]
2025-05-09 14:08:30,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [551.0, 304.0, 80.0, 216.0, 295.0, 164.0, 215.0, 140.0, 142.0, 127.0]
2025-05-09 14:08:30,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (1185.29) for latency MM1Queue_a033_s075
2025-05-09 14:08:30,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 14:08:30,888 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 14:08:32,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 13 hours, 42 minutes, 10 seconds)
2025-05-09 14:18:24,991 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 14:18:24,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:19:07,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 934.23877 ± 191.035
2025-05-09 14:19:07,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [1211.8229, 1070.6998, 671.77203, 798.9936, 1145.6399, 796.0917, 753.9893, 788.0732, 1187.5917, 917.7139]
2025-05-09 14:19:07,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [230.0, 194.0, 124.0, 177.0, 223.0, 150.0, 145.0, 150.0, 226.0, 173.0]
2025-05-09 14:19:07,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 13 hours, 27 minutes, 24 seconds)
2025-05-09 14:29:17,822 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 14:29:17,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:30:03,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 1031.34644 ± 369.458
2025-05-09 14:30:03,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [1220.2722, 845.5825, 1127.3439, 582.7534, 1106.7726, 840.58514, 1256.7491, 637.35376, 784.2664, 1911.7859]
2025-05-09 14:30:03,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [248.0, 161.0, 215.0, 117.0, 217.0, 171.0, 230.0, 122.0, 146.0, 353.0]
2025-05-09 14:30:03,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 13 hours, 18 minutes, 55 seconds)
2025-05-09 14:40:09,807 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 14:40:09,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:41:12,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 1449.46460 ± 449.755
2025-05-09 14:41:12,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [1274.926, 1260.653, 2562.909, 1506.5786, 673.1045, 1317.7896, 1396.839, 1372.5931, 1735.4812, 1393.7727]
2025-05-09 14:41:12,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [242.0, 239.0, 469.0, 281.0, 120.0, 240.0, 264.0, 258.0, 332.0, 273.0]
2025-05-09 14:41:12,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (1449.46) for latency MM1Queue_a033_s075
2025-05-09 14:41:12,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 14:41:12,317 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 14:41:12,337 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 13 hours, 9 minutes, 45 seconds)
2025-05-09 14:51:24,966 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 14:51:25,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 14:52:14,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 1126.04419 ± 416.242
2025-05-09 14:52:14,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [1502.9905, 768.0661, 485.23917, 831.7455, 1178.7415, 1064.0704, 936.1313, 1244.4121, 1170.9589, 2078.0862]
2025-05-09 14:52:14,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [272.0, 142.0, 91.0, 159.0, 211.0, 217.0, 167.0, 227.0, 243.0, 380.0]
2025-05-09 14:52:14,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 13 hours, 1 minute, 2 seconds)
2025-05-09 15:02:22,183 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 15:02:22,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:03:27,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 1448.39966 ± 611.818
2025-05-09 15:03:27,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [703.5983, 2082.7861, 1343.3068, 2775.2063, 763.77435, 808.19214, 1577.8146, 1307.2109, 1368.7764, 1753.3296]
2025-05-09 15:03:27,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 422.0, 245.0, 533.0, 149.0, 146.0, 294.0, 247.0, 253.0, 340.0]
2025-05-09 15:03:27,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 12 hours, 48 minutes, 49 seconds)
2025-05-09 15:13:44,692 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 15:13:45,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:14:43,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 1331.36584 ± 320.029
2025-05-09 15:14:43,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [1326.4932, 1517.0709, 942.2087, 1010.11255, 1360.4147, 865.6821, 1986.3083, 1274.322, 1422.5359, 1608.5095]
2025-05-09 15:14:43,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [244.0, 287.0, 177.0, 191.0, 258.0, 167.0, 373.0, 235.0, 262.0, 294.0]
2025-05-09 15:14:43,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 12 hours, 47 minutes, 16 seconds)
2025-05-09 15:24:41,567 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 15:24:41,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:25:43,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 1415.15356 ± 715.960
2025-05-09 15:25:43,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [522.3173, 1202.2886, 2385.3926, 487.36218, 1304.045, 1057.6213, 1331.6747, 1358.7772, 1567.7765, 2934.2808]
2025-05-09 15:25:43,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 212.0, 434.0, 93.0, 244.0, 199.0, 247.0, 264.0, 280.0, 553.0]
2025-05-09 15:25:43,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 12 hours, 36 minutes, 54 seconds)
2025-05-09 15:35:53,966 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 15:35:53,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:37:00,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 1511.14368 ± 820.871
2025-05-09 15:37:00,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [790.3204, 959.1881, 1046.2542, 1787.0754, 3807.6665, 1250.2516, 1729.8666, 1331.1915, 1127.9674, 1281.6539]
2025-05-09 15:37:00,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [155.0, 171.0, 198.0, 328.0, 705.0, 229.0, 334.0, 259.0, 212.0, 245.0]
2025-05-09 15:37:00,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (1511.14) for latency MM1Queue_a033_s075
2025-05-09 15:37:00,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 15:37:00,413 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 15:37:00,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 12 hours, 27 minutes, 44 seconds)
2025-05-09 15:47:12,357 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 15:47:12,360 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:48:30,743 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 1761.42065 ± 946.045
2025-05-09 15:48:30,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [1114.9265, 1539.268, 1360.1993, 1272.3392, 4429.403, 1363.7407, 1248.5144, 2271.134, 1229.6223, 1785.0586]
2025-05-09 15:48:30,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [215.0, 298.0, 263.0, 241.0, 833.0, 264.0, 243.0, 434.0, 243.0, 339.0]
2025-05-09 15:48:30,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (1761.42) for latency MM1Queue_a033_s075
2025-05-09 15:48:30,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 15:48:30,764 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 15:48:30,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 12 hours, 22 minutes, 52 seconds)
2025-05-09 15:58:42,621 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 15:58:42,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 15:59:45,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 1416.27930 ± 632.569
2025-05-09 15:59:45,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [384.00528, 1629.9075, 2411.5361, 1485.1893, 1023.3258, 725.95325, 733.5018, 1811.2657, 2043.5759, 1914.5312]
2025-05-09 15:59:45,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 321.0, 454.0, 275.0, 196.0, 135.0, 146.0, 351.0, 400.0, 368.0]
2025-05-09 15:59:45,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 12 hours, 11 minutes, 50 seconds)
2025-05-09 16:10:16,054 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 16:10:16,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:11:35,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 1793.56079 ± 789.389
2025-05-09 16:11:35,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [2033.3434, 1264.218, 1206.9828, 946.8095, 2885.5208, 1668.7557, 493.33014, 2592.6465, 2924.807, 1919.1952]
2025-05-09 16:11:35,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [394.0, 240.0, 235.0, 185.0, 554.0, 309.0, 89.0, 487.0, 551.0, 371.0]
2025-05-09 16:11:35,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (1793.56) for latency MM1Queue_a033_s075
2025-05-09 16:11:35,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 16:11:35,174 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 16:11:35,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 12 hours, 7 minutes, 56 seconds)
2025-05-09 16:21:25,492 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 16:21:25,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:22:56,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 2063.78442 ± 1078.233
2025-05-09 16:22:56,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [1534.1376, 3565.8572, 4419.115, 2009.8193, 1513.1798, 1362.2166, 1162.8768, 2431.6968, 732.8426, 1906.1008]
2025-05-09 16:22:56,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [297.0, 683.0, 838.0, 376.0, 298.0, 262.0, 230.0, 468.0, 138.0, 369.0]
2025-05-09 16:22:56,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (2063.78) for latency MM1Queue_a033_s075
2025-05-09 16:22:56,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 16:22:56,953 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 16:22:56,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 12 hours, 1 minute, 7 seconds)
2025-05-09 16:33:20,555 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 16:33:20,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:35:09,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 2466.99536 ± 1229.851
2025-05-09 16:35:09,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [3396.404, 1669.2665, 5370.534, 1601.6477, 1982.6381, 1212.5992, 1599.3093, 2073.391, 2049.8264, 3714.337]
2025-05-09 16:35:09,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [627.0, 325.0, 1000.0, 307.0, 383.0, 228.0, 307.0, 405.0, 384.0, 725.0]
2025-05-09 16:35:09,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (2467.00) for latency MM1Queue_a033_s075
2025-05-09 16:35:09,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 16:35:09,040 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 16:35:09,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 12 hours, 58 seconds)
2025-05-09 16:45:15,714 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 16:45:15,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:46:03,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 1060.88721 ± 538.295
2025-05-09 16:46:03,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [1807.3656, 719.2951, 545.30597, 1639.7784, 496.80466, 800.32733, 1099.7025, 477.714, 2005.9402, 1016.6383]
2025-05-09 16:46:03,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [351.0, 132.0, 117.0, 310.0, 93.0, 141.0, 209.0, 99.0, 385.0, 181.0]
2025-05-09 16:46:03,740 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 11 hours, 42 minutes, 5 seconds)
2025-05-09 16:56:23,850 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 16:56:23,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 16:57:19,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 1300.39929 ± 631.800
2025-05-09 16:57:19,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [1272.3882, 531.13, 858.5572, 2639.272, 1995.5327, 1397.8389, 1048.5594, 424.6471, 1592.1008, 1243.967]
2025-05-09 16:57:19,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [235.0, 102.0, 150.0, 485.0, 378.0, 254.0, 194.0, 76.0, 291.0, 229.0]
2025-05-09 16:57:19,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 11 hours, 30 minutes, 51 seconds)
2025-05-09 17:07:20,710 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 17:07:20,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:08:46,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 1997.73755 ± 1007.389
2025-05-09 17:08:46,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [1577.6992, 1364.0085, 532.2155, 2646.1865, 1686.807, 4147.6533, 1701.558, 1171.9678, 1923.4965, 3225.7842]
2025-05-09 17:08:46,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [285.0, 256.0, 99.0, 485.0, 313.0, 758.0, 317.0, 205.0, 369.0, 593.0]
2025-05-09 17:08:46,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 11 hours, 14 minutes, 47 seconds)
2025-05-09 17:18:44,677 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 17:18:44,681 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:20:36,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 2511.28125 ± 1591.875
2025-05-09 17:20:36,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [1460.0084, 1784.2369, 4361.9873, 4520.7344, 2448.1099, 539.8855, 1673.6387, 1742.032, 1072.5442, 5509.635]
2025-05-09 17:20:36,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [267.0, 332.0, 809.0, 856.0, 466.0, 110.0, 323.0, 329.0, 206.0, 1000.0]
2025-05-09 17:20:36,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (2511.28) for latency MM1Queue_a033_s075
2025-05-09 17:20:36,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 17:20:36,289 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 17:20:36,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 11 hours, 8 minutes, 48 seconds)
2025-05-09 17:31:01,760 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 17:31:01,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:33:12,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 3079.02222 ± 1637.501
2025-05-09 17:33:12,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [2097.3145, 5038.931, 3337.1943, 592.2123, 5510.855, 1844.7639, 1877.8711, 4555.392, 4496.5454, 1439.1432]
2025-05-09 17:33:12,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [400.0, 930.0, 602.0, 117.0, 1000.0, 341.0, 348.0, 841.0, 830.0, 283.0]
2025-05-09 17:33:12,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (3079.02) for latency MM1Queue_a033_s075
2025-05-09 17:33:12,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 17:33:12,738 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 17:33:12,759 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 11 hours, 1 minute, 54 seconds)
2025-05-09 17:43:21,055 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 17:43:21,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:45:20,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 2812.71460 ± 1324.446
2025-05-09 17:45:20,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [1677.3079, 2088.9377, 1972.3604, 1795.5164, 5540.4673, 1559.2551, 1808.1959, 3777.2612, 3489.2888, 4418.557]
2025-05-09 17:45:20,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [317.0, 377.0, 363.0, 332.0, 1000.0, 296.0, 339.0, 695.0, 660.0, 810.0]
2025-05-09 17:45:20,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 11 hours, 4 minutes, 1 second)
2025-05-09 17:55:57,162 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 17:55:57,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 17:58:07,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 2942.33252 ± 1653.869
2025-05-09 17:58:07,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [1935.3745, 1107.131, 2061.9407, 5430.6084, 4468.272, 4713.838, 4856.3306, 1644.9855, 2496.9456, 707.898]
2025-05-09 17:58:07,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [367.0, 207.0, 387.0, 1000.0, 841.0, 864.0, 895.0, 308.0, 482.0, 143.0]
2025-05-09 17:58:07,055 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 11 hours, 8 minutes, 45 seconds)
2025-05-09 18:08:00,140 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 18:08:00,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:09:45,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 2410.68359 ± 1363.637
2025-05-09 18:09:45,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [1923.5931, 421.4728, 2827.7634, 1927.2838, 4298.669, 1113.4131, 5177.8184, 2360.196, 2658.3198, 1398.3042]
2025-05-09 18:09:45,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [352.0, 77.0, 526.0, 363.0, 780.0, 209.0, 975.0, 440.0, 495.0, 265.0]
2025-05-09 18:09:45,444 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 10 hours, 58 minutes, 36 seconds)
2025-05-09 18:19:51,014 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 18:19:51,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:22:03,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 3023.91211 ± 1123.010
2025-05-09 18:22:03,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [3617.7239, 2434.2751, 4170.019, 1868.0753, 2257.1675, 3961.144, 1945.1387, 1995.1207, 2637.598, 5352.8604]
2025-05-09 18:22:03,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [711.0, 466.0, 796.0, 351.0, 440.0, 743.0, 371.0, 371.0, 502.0, 1000.0]
2025-05-09 18:22:03,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 10 hours, 51 minutes, 27 seconds)
2025-05-09 18:32:21,069 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 18:32:21,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:34:48,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 3367.03271 ± 2157.781
2025-05-09 18:34:48,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5533.381, 1485.8007, 689.94403, 5422.765, 5569.115, 416.19055, 5396.607, 2200.6677, 1493.6877, 5462.17]
2025-05-09 18:34:48,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 284.0, 130.0, 1000.0, 1000.0, 85.0, 1000.0, 422.0, 283.0, 1000.0]
2025-05-09 18:34:48,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (3367.03) for latency MM1Queue_a033_s075
2025-05-09 18:34:48,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 18:34:48,314 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 18:34:48,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 10 hours, 40 minutes, 33 seconds)
2025-05-09 18:45:33,316 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 18:45:33,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:47:51,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 3185.03638 ± 1196.033
2025-05-09 18:47:51,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [2455.0752, 5383.4834, 3831.067, 5270.8013, 2115.4773, 2737.7827, 2078.0806, 2807.1448, 3194.2805, 1977.1725]
2025-05-09 18:47:51,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [465.0, 998.0, 722.0, 990.0, 406.0, 521.0, 396.0, 533.0, 603.0, 370.0]
2025-05-09 18:47:51,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 10 hours, 37 minutes, 38 seconds)
2025-05-09 18:57:17,031 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 18:57:17,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 18:59:08,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 2604.31104 ± 1295.717
2025-05-09 18:59:08,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [1152.6268, 1655.1786, 3336.4082, 1194.7693, 4724.603, 2109.8906, 2899.1802, 4949.491, 2346.373, 1674.5902]
2025-05-09 18:59:08,696 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [216.0, 297.0, 598.0, 227.0, 866.0, 404.0, 550.0, 909.0, 443.0, 307.0]
2025-05-09 18:59:08,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 10 hours, 10 minutes, 16 seconds)
2025-05-09 19:09:38,008 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 19:09:38,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:12:33,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 4152.72607 ± 1159.590
2025-05-09 19:12:33,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [3493.543, 5417.0723, 4793.8716, 4100.531, 3577.1125, 3743.0579, 5470.809, 1599.8237, 3738.5034, 5592.9365]
2025-05-09 19:12:33,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [629.0, 1000.0, 865.0, 741.0, 650.0, 710.0, 1000.0, 297.0, 714.0, 1000.0]
2025-05-09 19:12:33,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (4152.73) for latency MM1Queue_a033_s075
2025-05-09 19:12:33,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 19:12:33,501 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 19:12:33,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 10 hours, 15 minutes, 27 seconds)
2025-05-09 19:22:41,772 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 19:22:41,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:26:10,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 4970.59814 ± 1054.107
2025-05-09 19:26:10,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5646.973, 5529.6836, 4172.4272, 5600.8174, 4324.3354, 5625.489, 5614.844, 5551.6343, 5410.4297, 2229.346]
2025-05-09 19:26:10,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 747.0, 1000.0, 802.0, 1000.0, 997.0, 1000.0, 1000.0, 408.0]
2025-05-09 19:26:10,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (4970.60) for latency MM1Queue_a033_s075
2025-05-09 19:26:10,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 19:26:10,464 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 19:26:10,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 10 hours, 15 minutes, 28 seconds)
2025-05-09 19:36:18,869 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 19:36:18,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:39:05,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 3853.32544 ± 1813.612
2025-05-09 19:39:05,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [1012.66205, 4101.2036, 1496.5846, 5379.0356, 5378.0693, 5502.75, 5345.5015, 3586.7905, 5493.6724, 1236.9843]
2025-05-09 19:39:05,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [192.0, 744.0, 284.0, 1000.0, 1000.0, 1000.0, 1000.0, 687.0, 1000.0, 240.0]
2025-05-09 19:39:05,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 10 hours, 4 minutes, 13 seconds)
2025-05-09 19:49:27,863 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 19:49:28,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 19:52:13,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 3842.29688 ± 1441.047
2025-05-09 19:52:13,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [3749.6155, 3203.8374, 948.745, 2380.3496, 2791.278, 4588.1616, 5443.507, 5500.5767, 4364.4214, 5452.478]
2025-05-09 19:52:13,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [710.0, 592.0, 170.0, 431.0, 526.0, 842.0, 1000.0, 1000.0, 797.0, 1000.0]
2025-05-09 19:52:13,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 9 hours, 52 minutes, 9 seconds)
2025-05-09 20:01:58,807 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 20:01:58,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:05:27,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 4780.55371 ± 843.288
2025-05-09 20:05:27,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5286.5146, 5499.7295, 5390.6294, 2850.4583, 5263.705, 5361.8506, 4316.2207, 4271.7695, 4009.3926, 5555.2627]
2025-05-09 20:05:27,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 521.0, 1000.0, 1000.0, 789.0, 775.0, 732.0, 1000.0]
2025-05-09 20:05:27,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 9 hours, 56 minutes, 45 seconds)
2025-05-09 20:16:12,757 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 20:16:12,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:19:29,878 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 4748.17871 ± 1453.066
2025-05-09 20:19:29,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5598.9033, 5656.9556, 5712.462, 2750.2166, 1394.878, 5467.112, 5606.849, 5687.0605, 4011.1367, 5596.212]
2025-05-09 20:19:29,889 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 493.0, 258.0, 1000.0, 1000.0, 1000.0, 711.0, 1000.0]
2025-05-09 20:19:29,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 9 hours, 49 minutes, 4 seconds)
2025-05-09 20:29:18,938 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 20:29:18,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:32:38,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 4764.75488 ± 1551.626
2025-05-09 20:32:38,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5480.0176, 5533.325, 2703.415, 5535.5093, 5451.1245, 847.2517, 5535.1235, 5487.969, 5497.774, 5576.0396]
2025-05-09 20:32:38,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 485.0, 1000.0, 1000.0, 160.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 20:32:38,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 9 hours, 31 minutes, 41 seconds)
2025-05-09 20:42:53,175 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 20:42:53,180 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 20:46:43,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5394.16553 ± 660.143
2025-05-09 20:46:43,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5692.863, 5634.0503, 5624.366, 5613.934, 5527.4, 5621.4185, 5622.2046, 5655.4956, 3418.9624, 5530.962]
2025-05-09 20:46:43,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 642.0, 1000.0]
2025-05-09 20:46:43,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (5394.17) for latency MM1Queue_a033_s075
2025-05-09 20:46:43,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 20:46:43,551 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 20:46:43,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 9 hours, 28 minutes, 11 seconds)
2025-05-09 20:56:48,447 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 20:56:48,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:00:07,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 4784.92822 ± 1247.599
2025-05-09 21:00:07,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5639.902, 5613.5093, 2868.7188, 5517.376, 5502.139, 5518.0625, 5665.2183, 5194.0728, 4326.51, 2003.7728]
2025-05-09 21:00:07,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 527.0, 1000.0, 1000.0, 1000.0, 1000.0, 921.0, 774.0, 359.0]
2025-05-09 21:00:07,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 9 hours, 16 minutes, 45 seconds)
2025-05-09 21:10:14,904 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 21:10:14,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:13:39,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 4767.04736 ± 1350.509
2025-05-09 21:13:39,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [2559.015, 5520.7524, 5511.8506, 5535.1455, 5517.211, 5503.496, 5461.7827, 1666.9333, 5399.7812, 4994.5024]
2025-05-09 21:13:39,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [504.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 306.0, 1000.0, 918.0]
2025-05-09 21:13:39,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 9 hours, 5 minutes, 41 seconds)
2025-05-09 21:23:58,498 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 21:23:58,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:26:53,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 4148.33496 ± 1750.436
2025-05-09 21:26:53,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [2359.8962, 1093.6235, 5498.02, 5572.1655, 5455.2563, 5506.755, 3647.1492, 1478.1355, 5506.741, 5365.6104]
2025-05-09 21:26:53,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [428.0, 212.0, 1000.0, 1000.0, 1000.0, 1000.0, 675.0, 288.0, 1000.0, 1000.0]
2025-05-09 21:26:53,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 8 hours, 45 minutes, 42 seconds)
2025-05-09 21:37:33,882 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 21:37:33,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:41:25,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5572.35156 ± 87.775
2025-05-09 21:41:25,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5659.9067, 5604.7476, 5352.4575, 5479.4795, 5587.7617, 5640.3345, 5578.182, 5611.466, 5642.33, 5566.8535]
2025-05-09 21:41:25,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-09 21:41:25,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (5572.35) for latency MM1Queue_a033_s075
2025-05-09 21:41:25,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-09 21:41:25,582 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-09 21:41:25,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 8 hours, 42 minutes, 42 seconds)
2025-05-09 21:51:56,104 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 21:51:56,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 21:55:15,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 4641.24854 ± 1485.166
2025-05-09 21:55:15,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5384.479, 5415.407, 5564.693, 5539.585, 5545.2314, 4424.171, 1643.0703, 1843.9409, 5557.4004, 5494.506]
2025-05-09 21:55:15,255 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 830.0, 307.0, 350.0, 1000.0, 1000.0]
2025-05-09 21:55:15,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 8 hours, 27 minutes, 6 seconds)
2025-05-09 22:04:29,663 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 22:04:29,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:08:11,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5301.08057 ± 792.722
2025-05-09 22:08:11,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5583.334, 5568.442, 5507.1426, 5543.8574, 5523.6772, 5523.9976, 5633.392, 5606.01, 2925.7185, 5595.2363]
2025-05-09 22:08:11,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 542.0, 1000.0]
2025-05-09 22:08:11,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 8 hours, 10 minutes, 4 seconds)
2025-05-09 22:19:13,541 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 22:19:13,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:22:45,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5038.97217 ± 1187.213
2025-05-09 22:22:45,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5601.8545, 5607.279, 5631.853, 2026.8022, 5674.2656, 5587.6704, 3486.4167, 5605.7217, 5584.7837, 5583.0737]
2025-05-09 22:22:45,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 385.0, 1000.0, 1000.0, 641.0, 1000.0, 1000.0, 1000.0]
2025-05-09 22:22:45,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 8 hours, 3 minutes, 40 seconds)
2025-05-09 22:32:30,182 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 22:32:30,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:35:23,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 4121.14062 ± 1843.710
2025-05-09 22:35:23,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5527.9336, 2614.2026, 5547.442, 5384.9165, 1002.63995, 919.5688, 3642.9822, 5482.332, 5515.8315, 5573.559]
2025-05-09 22:35:23,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 477.0, 1000.0, 1000.0, 180.0, 176.0, 664.0, 1000.0, 1000.0, 1000.0]
2025-05-09 22:35:23,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 7 hours, 45 minutes, 45 seconds)
2025-05-09 22:45:18,796 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 22:45:18,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 22:48:57,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5125.61035 ± 1426.928
2025-05-09 22:48:58,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5525.2476, 5648.0723, 5595.9116, 5586.029, 5611.402, 5597.299, 5634.2427, 5648.1333, 5563.5654, 846.199]
2025-05-09 22:48:58,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 180.0]
2025-05-09 22:48:58,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 7 hours, 25 minutes, 46 seconds)
2025-05-09 22:59:10,327 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 22:59:10,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:01:56,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 3850.88354 ± 1940.793
2025-05-09 23:01:56,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5558.425, 5511.4346, 4111.3125, 5417.0386, 929.4746, 5546.934, 3766.0483, 1574.6522, 5453.013, 640.50543]
2025-05-09 23:01:56,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 760.0, 1000.0, 198.0, 1000.0, 679.0, 301.0, 1000.0, 128.0]
2025-05-09 23:01:56,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 7 hours, 6 minutes, 46 seconds)
2025-05-09 23:12:28,930 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 23:12:28,935 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:16:01,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 4931.32080 ± 1395.154
2025-05-09 23:16:01,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5537.0337, 5337.701, 4654.104, 5497.35, 5472.2437, 5497.7876, 5467.5386, 5582.684, 5451.229, 815.536]
2025-05-09 23:16:01,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 884.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 162.0]
2025-05-09 23:16:01,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 7 hours, 33 seconds)
2025-05-09 23:26:27,021 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 23:26:27,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:30:01,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5178.15869 ± 1202.739
2025-05-09 23:30:01,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5617.905, 5480.82, 5577.4727, 5547.1743, 5556.456, 5564.4927, 1572.2703, 5611.299, 5618.311, 5635.385]
2025-05-09 23:30:01,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 283.0, 1000.0, 1000.0, 1000.0]
2025-05-09 23:30:01,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 6 hours, 43 minutes, 35 seconds)
2025-05-09 23:40:29,500 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 23:40:29,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:44:01,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5130.48730 ± 1454.093
2025-05-09 23:44:01,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5561.4204, 5614.939, 5663.053, 5598.909, 5491.8726, 5682.7695, 5647.4487, 771.01544, 5639.252, 5634.194]
2025-05-09 23:44:01,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 149.0, 1000.0, 1000.0]
2025-05-09 23:44:01,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 6 hours, 38 minutes, 4 seconds)
2025-05-09 23:54:02,296 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-09 23:54:02,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-09 23:56:29,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 3508.68286 ± 2203.954
2025-05-09 23:56:29,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5689.098, 369.28522, 5567.7705, 5511.7974, 2729.4016, 2262.0137, 1310.1503, 5637.0493, 389.8179, 5620.443]
2025-05-09 23:56:29,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 65.0, 1000.0, 1000.0, 492.0, 424.0, 248.0, 1000.0, 82.0, 1000.0]
2025-05-09 23:56:29,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 6 hours, 18 minutes, 7 seconds)
2025-05-10 00:06:40,787 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 00:06:40,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:10:15,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5206.50488 ± 1328.112
2025-05-10 00:10:15,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5657.729, 5644.0107, 5665.615, 5601.0566, 5672.1973, 1222.579, 5634.573, 5651.8325, 5652.9316, 5662.528]
2025-05-10 00:10:15,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 230.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 00:10:15,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 6 hours, 8 minutes, 55 seconds)
2025-05-10 00:20:28,696 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 00:20:28,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:24:22,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5558.82861 ± 35.773
2025-05-10 00:24:22,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5543.3154, 5585.7085, 5502.14, 5563.6494, 5563.3774, 5496.908, 5617.6978, 5582.3857, 5583.9023, 5549.2017]
2025-05-10 00:24:22,445 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 00:24:22,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 5 hours, 55 minutes, 25 seconds)
2025-05-10 00:34:34,537 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 00:34:34,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:38:18,614 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5191.43604 ± 885.575
2025-05-10 00:38:19,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5447.941, 5529.1104, 5512.776, 5496.743, 5426.116, 5555.9033, 5481.279, 2539.7134, 5551.128, 5373.649]
2025-05-10 00:38:19,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 462.0, 1000.0, 1000.0]
2025-05-10 00:38:19,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 5 hours, 41 minutes, 28 seconds)
2025-05-10 00:48:01,399 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 00:48:01,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 00:51:45,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5497.11182 ± 524.585
2025-05-10 00:51:45,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5709.5605, 5677.6743, 5666.169, 5676.1313, 5686.566, 5700.8076, 5696.81, 5594.7812, 5636.2876, 3926.3298]
2025-05-10 00:51:45,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 690.0]
2025-05-10 00:51:45,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 5 hours, 25 minutes, 5 seconds)
2025-05-10 01:01:55,140 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 01:01:55,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:05:47,286 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5674.53906 ± 46.776
2025-05-10 01:05:47,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5646.0034, 5652.5337, 5701.386, 5612.77, 5592.4736, 5699.4404, 5749.502, 5694.643, 5670.6133, 5726.028]
2025-05-10 01:05:47,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 01:05:47,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (5674.54) for latency MM1Queue_a033_s075
2025-05-10 01:05:47,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-10 01:05:47,298 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 01:05:47,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 5 hours, 18 minutes, 46 seconds)
2025-05-10 01:15:59,478 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 01:15:59,484 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:19:19,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 4717.80762 ± 1509.943
2025-05-10 01:19:19,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5537.402, 5557.3145, 5694.08, 1787.6318, 5446.868, 5612.2817, 4555.066, 1737.8373, 5615.777, 5633.8164]
2025-05-10 01:19:19,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 353.0, 1000.0, 1000.0, 818.0, 338.0, 1000.0, 1000.0]
2025-05-10 01:19:19,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 3 minutes, 55 seconds)
2025-05-10 01:29:16,256 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 01:29:16,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:32:53,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5174.51270 ± 1344.449
2025-05-10 01:32:53,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5613.1914, 1143.6617, 5717.4, 5665.173, 5542.5903, 5611.823, 5557.089, 5621.015, 5624.988, 5648.1934]
2025-05-10 01:32:53,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 206.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 01:32:53,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 4 hours, 47 minutes, 46 seconds)
2025-05-10 01:43:42,772 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 01:43:42,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 01:47:37,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5648.17725 ± 24.170
2025-05-10 01:47:37,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5693.472, 5605.127, 5644.272, 5629.767, 5637.751, 5653.933, 5641.083, 5677.9727, 5633.38, 5665.021]
2025-05-10 01:47:37,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 01:47:37,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 4 hours, 37 minutes, 14 seconds)
2025-05-10 01:57:32,432 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 01:57:32,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:01:09,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5248.27490 ± 846.179
2025-05-10 02:01:09,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5656.728, 5686.584, 5667.74, 5578.984, 5642.99, 3244.2131, 5748.017, 5697.153, 3927.4163, 5632.922]
2025-05-10 02:01:09,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 571.0, 1000.0, 1000.0, 702.0, 1000.0]
2025-05-10 02:01:09,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 4 hours, 23 minutes, 46 seconds)
2025-05-10 02:11:22,653 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 02:11:22,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:15:21,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5667.54248 ± 54.544
2025-05-10 02:15:21,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5640.94, 5552.641, 5709.5146, 5679.985, 5624.955, 5720.219, 5745.403, 5623.4756, 5675.038, 5703.253]
2025-05-10 02:15:21,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 02:15:22,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 10 minutes, 30 seconds)
2025-05-10 02:25:36,698 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 02:25:36,704 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:29:18,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5423.64404 ± 576.677
2025-05-10 02:29:18,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5607.018, 5636.106, 5177.853, 5687.816, 3751.0688, 5668.878, 5648.8154, 5705.9053, 5657.625, 5695.3555]
2025-05-10 02:29:18,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 910.0, 1000.0, 668.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 02:29:18,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 3 hours, 57 minutes, 55 seconds)
2025-05-10 02:39:30,889 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 02:39:30,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:43:11,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5241.99268 ± 1082.380
2025-05-10 02:43:11,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5665.3564, 5583.6865, 1995.7008, 5576.892, 5602.3965, 5582.4346, 5604.8574, 5624.7944, 5587.4316, 5596.372]
2025-05-10 02:43:11,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 360.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 02:43:11,518 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 3 hours, 44 minutes, 57 seconds)
2025-05-10 02:53:17,394 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 02:53:17,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 02:57:10,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5616.87793 ± 65.779
2025-05-10 02:57:10,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5662.918, 5640.8145, 5649.0254, 5666.4966, 5523.412, 5631.36, 5533.3916, 5502.3325, 5689.4526, 5669.5747]
2025-05-10 02:57:10,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 02:57:10,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 28 minutes, 37 seconds)
2025-05-10 03:07:21,630 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 03:07:21,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 03:11:09,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5638.73633 ± 178.394
2025-05-10 03:11:09,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5754.567, 5203.352, 5718.67, 5676.754, 5656.1206, 5403.8877, 5792.842, 5685.009, 5715.096, 5781.063]
2025-05-10 03:11:09,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 918.0, 1000.0, 1000.0, 1000.0, 975.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 03:11:09,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 16 minutes)
2025-05-10 03:21:21,019 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 03:21:21,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 03:24:57,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5013.72461 ± 956.272
2025-05-10 03:24:57,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [3023.6968, 5456.815, 5550.573, 5346.8936, 5508.264, 5534.1025, 5542.913, 5551.6313, 3189.365, 5432.9937]
2025-05-10 03:24:57,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [561.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 608.0, 1000.0]
2025-05-10 03:24:57,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 54 seconds)
2025-05-10 03:35:16,896 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 03:35:16,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 03:38:52,095 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5288.25830 ± 1303.695
2025-05-10 03:38:52,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5725.0493, 5732.345, 5685.4546, 5743.7734, 1380.6682, 5775.8896, 5754.1616, 5584.0996, 5794.4395, 5706.6987]
2025-05-10 03:38:52,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 243.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 03:38:52,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 2 hours, 46 minutes, 56 seconds)
2025-05-10 03:48:46,883 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 03:48:47,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 03:52:38,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5805.59668 ± 23.044
2025-05-10 03:52:38,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5800.9365, 5784.728, 5799.7, 5824.7246, 5801.3555, 5815.447, 5771.8926, 5857.8003, 5813.945, 5785.432]
2025-05-10 03:52:38,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 03:52:38,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1226 [INFO]: New best (5805.60) for latency MM1Queue_a033_s075
2025-05-10 03:52:38,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1229 [INFO]: saving network
2025-05-10 03:52:38,039 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc8/noisy-humanoid/MM1Queue_a033_s075-mbpac-highdim-memdelay/checkpoints/best_MM1Queue_a033_s075.pkl
2025-05-10 03:52:38,062 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 32 minutes, 46 seconds)
2025-05-10 04:02:51,983 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 04:02:51,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 04:06:33,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5302.99414 ± 938.318
2025-05-10 04:06:33,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [2492.9849, 5576.9844, 5664.3726, 5632.2666, 5620.641, 5597.088, 5640.382, 5662.9517, 5472.204, 5670.0654]
2025-05-10 04:06:33,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [451.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 04:06:33,899 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 18 minutes, 47 seconds)
2025-05-10 04:16:46,329 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 04:16:46,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 04:19:50,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 4293.96729 ± 1756.060
2025-05-10 04:19:50,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [1676.3625, 5349.273, 827.144, 4858.0156, 2587.519, 5511.285, 5526.088, 5514.3486, 5591.645, 5497.9917]
2025-05-10 04:19:50,876 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [313.0, 1000.0, 149.0, 879.0, 483.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 04:19:50,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 3 minutes, 37 seconds)
2025-05-10 04:30:03,784 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 04:30:03,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 04:33:54,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5712.91992 ± 32.388
2025-05-10 04:33:54,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5772.143, 5724.239, 5674.176, 5712.237, 5734.8643, 5719.1445, 5733.518, 5700.5884, 5647.726, 5710.565]
2025-05-10 04:33:54,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 04:33:54,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 1 hour, 50 minutes, 19 seconds)
2025-05-10 04:44:08,527 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 04:44:08,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 04:47:58,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5446.31152 ± 601.783
2025-05-10 04:47:58,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5634.586, 5600.7505, 5675.8525, 5735.6206, 5728.23, 5632.005, 5577.4165, 5584.487, 5646.5767, 3647.5923]
2025-05-10 04:47:58,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 648.0]
2025-05-10 04:47:58,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 36 minutes, 45 seconds)
2025-05-10 04:58:21,240 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 04:58:21,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 05:02:12,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5655.51221 ± 37.870
2025-05-10 05:02:13,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5688.358, 5611.9673, 5636.3506, 5623.1396, 5692.8965, 5685.685, 5625.9624, 5639.688, 5622.072, 5729.005]
2025-05-10 05:02:13,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 05:02:13,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 23 minutes, 30 seconds)
2025-05-10 05:12:24,273 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 05:12:24,278 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 05:16:15,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5734.02246 ± 46.391
2025-05-10 05:16:15,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5719.6416, 5797.997, 5732.039, 5706.256, 5662.2144, 5794.5054, 5745.231, 5707.041, 5795.7515, 5679.5454]
2025-05-10 05:16:15,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 05:16:15,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 9 minutes, 41 seconds)
2025-05-10 05:26:37,494 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 05:26:37,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 05:30:34,802 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5687.96045 ± 18.938
2025-05-10 05:30:34,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5684.5264, 5718.7866, 5691.988, 5721.8604, 5663.2393, 5691.6562, 5663.715, 5689.3447, 5679.9565, 5674.532]
2025-05-10 05:30:34,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 05:30:34,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 56 minutes, 35 seconds)
2025-05-10 05:40:47,970 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 05:40:47,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 05:44:38,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5616.03662 ± 36.502
2025-05-10 05:44:38,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5621.433, 5616.7095, 5626.9517, 5591.5615, 5612.142, 5671.042, 5545.774, 5672.538, 5623.0225, 5579.1934]
2025-05-10 05:44:38,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 05:44:38,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 42 minutes, 26 seconds)
2025-05-10 05:54:50,590 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 05:54:50,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 05:58:44,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5667.91406 ± 26.449
2025-05-10 05:58:45,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5710.8047, 5667.1323, 5687.976, 5664.376, 5672.6006, 5617.9814, 5633.609, 5663.4336, 5661.8267, 5699.3984]
2025-05-10 05:58:45,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 05:58:45,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 28 minutes, 18 seconds)
2025-05-10 06:09:01,829 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 06:09:02,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 06:12:44,596 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5394.61816 ± 596.030
2025-05-10 06:12:44,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5673.7197, 5706.0083, 5671.993, 5712.107, 5694.785, 5700.1016, 5692.379, 5687.4253, 4259.292, 4148.3677]
2025-05-10 06:12:44,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 743.0, 731.0]
2025-05-10 06:12:44,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 14 minutes, 6 seconds)
2025-05-10 06:22:41,640 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-10 06:22:41,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-05-10 06:26:28,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1221 [DEBUG]: Total Reward: 5505.14551 ± 745.134
2025-05-10 06:26:28,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1222 [DEBUG]: All rewards: [5776.5786, 3270.684, 5740.073, 5763.855, 5757.6494, 5775.9536, 5781.6045, 5715.764, 5748.8013, 5720.489]
2025-05-10 06:26:28,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 579.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-10 06:26:28,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-humanoid):1251 [DEBUG]: Training session finished
