2026-01-22 23:01:34,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-hopper/DatasetOffice-mbpac_memdelay
2026-01-22 23:01:34,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-hopper/DatasetOffice-mbpac_memdelay
2026-01-22 23:01:34,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x152f564aa110>}
2026-01-22 23:01:34,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1159 [DEBUG]: using device: cuda
2026-01-22 23:01:34,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1181 [INFO]: Creating new trainer
2026-01-22 23:01:34,263 baseline-mbpac-noisy-hopper:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2026-01-22 23:01:34,263 baseline-mbpac-noisy-hopper:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-22 23:01:34,270 baseline-mbpac-noisy-hopper:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=3, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2026-01-22 23:01:35,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1242 [DEBUG]: Starting training session...
2026-01-22 23:01:35,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 1/100
2026-01-22 23:13:18,654 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-22 23:13:18,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:13:30,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 62.31176 ± 12.091
2026-01-22 23:13:30,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [65.34345, 82.53663, 49.207912, 62.46925, 66.63143, 61.70311, 56.109356, 43.82962, 53.222473, 82.064415]
2026-01-22 23:13:30,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [36.0, 45.0, 28.0, 34.0, 37.0, 35.0, 32.0, 25.0, 34.0, 44.0]
2026-01-22 23:13:30,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1274 [INFO]: New best (62.31) for latency DatasetOffice
2026-01-22 23:13:30,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 2/100 (estimated time remaining: 19 hours, 41 minutes, 13 seconds)
2026-01-22 23:25:44,152 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-22 23:25:44,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:26:25,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 242.97417 ± 111.466
2026-01-22 23:26:25,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [167.6922, 296.7535, 182.23097, 464.46103, 165.55711, 56.855824, 274.97052, 291.67758, 359.53842, 170.00468]
2026-01-22 23:26:25,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [90.0, 134.0, 93.0, 196.0, 89.0, 39.0, 128.0, 134.0, 151.0, 105.0]
2026-01-22 23:26:25,352 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1274 [INFO]: New best (242.97) for latency DatasetOffice
2026-01-22 23:26:25,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 3/100 (estimated time remaining: 20 hours, 17 minutes, 3 seconds)
2026-01-22 23:38:42,502 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-22 23:38:42,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:43:58,428 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 950.87659 ± 145.514
2026-01-22 23:43:58,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [558.6412, 1021.7164, 804.67914, 1022.16364, 1027.6858, 1016.988, 1012.81604, 1019.7079, 1006.6969, 1017.67114]
2026-01-22 23:43:58,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [372.0, 1000.0, 676.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:43:58,429 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1274 [INFO]: New best (950.88) for latency DatasetOffice
2026-01-22 23:43:58,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 4/100 (estimated time remaining: 22 hours, 50 minutes, 34 seconds)
2026-01-22 23:55:47,129 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-22 23:55:47,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:00:27,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 843.64392 ± 170.735
2026-01-23 00:00:27,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [983.0676, 1024.7561, 851.52325, 824.17426, 761.9355, 827.64056, 1000.9172, 442.49866, 708.79175, 1011.1351]
2026-01-23 00:00:27,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 746.0, 677.0, 633.0, 693.0, 1000.0, 443.0, 677.0, 1000.0]
2026-01-23 00:00:27,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 5/100 (estimated time remaining: 23 hours, 32 minutes, 58 seconds)
2026-01-23 00:12:43,684 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 00:12:43,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:14:11,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 381.97549 ± 192.148
2026-01-23 00:14:11,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [239.3113, 313.72034, 775.6133, 637.6707, 457.89807, 307.36557, 81.01971, 420.53223, 342.0579, 244.56566]
2026-01-23 00:14:11,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [145.0, 159.0, 571.0, 453.0, 292.0, 246.0, 49.0, 243.0, 172.0, 174.0]
2026-01-23 00:14:11,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 6/100 (estimated time remaining: 22 hours, 59 minutes, 36 seconds)
2026-01-23 00:25:53,729 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 00:25:53,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:27:09,422 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 407.91644 ± 67.465
2026-01-23 00:27:09,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [417.30328, 425.40225, 328.49185, 305.34494, 461.77075, 397.10278, 373.46204, 471.86246, 540.28296, 358.14133]
2026-01-23 00:27:09,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [237.0, 245.0, 135.0, 120.0, 191.0, 219.0, 198.0, 258.0, 355.0, 172.0]
2026-01-23 00:27:09,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 7/100 (estimated time remaining: 23 hours, 4 minutes, 26 seconds)
2026-01-23 00:39:11,735 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 00:39:11,737 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:40:20,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 407.61530 ± 48.962
2026-01-23 00:40:20,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [414.3559, 362.91083, 403.1331, 449.47388, 349.3496, 438.48303, 373.68155, 441.40378, 340.68597, 502.67526]
2026-01-23 00:40:20,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [204.0, 174.0, 191.0, 197.0, 145.0, 218.0, 194.0, 203.0, 143.0, 292.0]
2026-01-23 00:40:20,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 8/100 (estimated time remaining: 22 hours, 55 minutes, 1 second)
2026-01-23 00:52:29,912 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 00:52:29,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:53:43,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 448.03369 ± 75.652
2026-01-23 00:53:43,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [408.80804, 525.9033, 541.5575, 406.35764, 323.62827, 504.2558, 344.9933, 545.05566, 426.73514, 453.04224]
2026-01-23 00:53:43,080 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [194.0, 236.0, 233.0, 184.0, 144.0, 261.0, 150.0, 277.0, 190.0, 210.0]
2026-01-23 00:53:43,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 9/100 (estimated time remaining: 21 hours, 23 minutes, 17 seconds)
2026-01-23 01:05:51,995 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 01:05:51,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:06:38,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 320.83710 ± 211.198
2026-01-23 01:06:38,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [534.6507, 397.24274, 547.9292, 237.65752, 555.34576, 577.0804, 62.062424, 114.67567, 83.6408, 98.08543]
2026-01-23 01:06:38,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [190.0, 173.0, 194.0, 101.0, 211.0, 224.0, 37.0, 73.0, 55.0, 63.0]
2026-01-23 01:06:38,293 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 10/100 (estimated time remaining: 20 hours, 4 minutes, 28 seconds)
2026-01-23 01:18:39,262 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 01:18:39,264 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:19:56,154 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 534.96771 ± 329.388
2026-01-23 01:19:56,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1484.752, 458.37357, 511.59167, 374.03848, 564.62054, 564.56085, 297.40594, 420.09338, 358.1968, 316.04413]
2026-01-23 01:19:56,155 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [587.0, 193.0, 178.0, 184.0, 224.0, 210.0, 135.0, 163.0, 165.0, 143.0]
2026-01-23 01:19:56,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 11/100 (estimated time remaining: 19 hours, 43 minutes, 19 seconds)
2026-01-23 01:32:09,811 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 01:32:09,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:33:30,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 621.43005 ± 92.226
2026-01-23 01:33:30,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [772.1133, 547.72595, 528.0691, 666.8659, 724.55505, 698.9872, 550.826, 683.528, 534.9493, 506.68027]
2026-01-23 01:33:30,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [277.0, 204.0, 212.0, 245.0, 250.0, 251.0, 204.0, 266.0, 200.0, 185.0]
2026-01-23 01:33:30,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 12/100 (estimated time remaining: 19 hours, 41 minutes, 6 seconds)
2026-01-23 01:45:29,295 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 01:45:29,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:47:02,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 817.04156 ± 55.708
2026-01-23 01:47:02,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [849.1545, 899.16284, 771.7343, 766.0186, 736.2924, 827.97015, 885.88837, 849.86707, 742.62604, 841.70135]
2026-01-23 01:47:02,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [273.0, 286.0, 249.0, 254.0, 243.0, 270.0, 295.0, 258.0, 227.0, 265.0]
2026-01-23 01:47:02,654 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 13/100 (estimated time remaining: 19 hours, 33 minutes, 50 seconds)
2026-01-23 01:59:10,306 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 01:59:10,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:21,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 657.85962 ± 77.882
2026-01-23 02:00:21,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [777.42395, 629.1729, 628.22, 565.8743, 638.17175, 631.10834, 830.1379, 592.19025, 624.8625, 661.43463]
2026-01-23 02:00:21,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [238.0, 201.0, 192.0, 174.0, 188.0, 200.0, 246.0, 182.0, 186.0, 208.0]
2026-01-23 02:00:21,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 14/100 (estimated time remaining: 19 hours, 19 minutes, 40 seconds)
2026-01-23 02:12:07,619 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 02:12:07,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:13:05,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 373.55157 ± 222.668
2026-01-23 02:13:05,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [585.04565, 167.07893, 353.80383, 172.58849, 139.28192, 584.7081, 515.8931, 141.45595, 273.1584, 802.5013]
2026-01-23 02:13:05,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [233.0, 89.0, 161.0, 89.0, 77.0, 217.0, 234.0, 79.0, 133.0, 364.0]
2026-01-23 02:13:05,228 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 15/100 (estimated time remaining: 19 hours, 2 minutes, 55 seconds)
2026-01-23 02:25:01,696 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 02:25:01,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:26:48,994 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 914.41943 ± 288.066
2026-01-23 02:26:48,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [725.1461, 924.60126, 759.3291, 931.154, 417.45303, 1433.8115, 1115.036, 1262.616, 959.46, 615.5867]
2026-01-23 02:26:48,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [286.0, 304.0, 232.0, 336.0, 178.0, 491.0, 365.0, 383.0, 335.0, 202.0]
2026-01-23 02:26:49,000 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 16/100 (estimated time remaining: 18 hours, 56 minutes, 58 seconds)
2026-01-23 02:39:13,510 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 02:39:13,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:43:20,868 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2048.39844 ± 981.219
2026-01-23 02:43:20,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [2957.4595, 2905.0933, 832.8808, 2913.9463, 2468.3645, 2778.233, 2943.5903, 1358.4885, 751.579, 574.3491]
2026-01-23 02:43:20,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 312.0, 1000.0, 813.0, 934.0, 1000.0, 509.0, 326.0, 235.0]
2026-01-23 02:43:20,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1274 [INFO]: New best (2048.40) for latency DatasetOffice
2026-01-23 02:43:20,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 17/100 (estimated time remaining: 19 hours, 33 minutes, 15 seconds)
2026-01-23 02:55:18,926 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 02:55:18,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:01:03,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2862.60205 ± 47.025
2026-01-23 03:01:03,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [2896.1255, 2769.397, 2848.5442, 2857.3535, 2807.5566, 2892.1536, 2891.9478, 2947.8086, 2858.6707, 2856.4622]
2026-01-23 03:01:03,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 967.0, 1000.0, 1000.0]
2026-01-23 03:01:03,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1274 [INFO]: New best (2862.60) for latency DatasetOffice
2026-01-23 03:01:03,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 18/100 (estimated time remaining: 20 hours, 28 minutes, 43 seconds)
2026-01-23 03:12:41,350 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 03:12:41,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:18:18,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2728.58667 ± 361.757
2026-01-23 03:18:18,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [2897.8037, 2843.6455, 2873.6602, 2817.637, 2809.779, 2974.2932, 1655.5979, 2766.906, 2814.226, 2832.3193]
2026-01-23 03:18:18,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 590.0, 1000.0, 1000.0, 1000.0]
2026-01-23 03:18:18,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 19/100 (estimated time remaining: 21 hours, 18 minutes, 14 seconds)
2026-01-23 03:30:17,733 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 03:30:17,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:32:14,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1072.51111 ± 346.288
2026-01-23 03:32:14,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [956.2558, 657.6643, 1315.6698, 1265.4939, 885.09064, 1135.522, 1332.8668, 1094.4674, 1676.5812, 405.49936]
2026-01-23 03:32:14,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [303.0, 209.0, 404.0, 393.0, 277.0, 360.0, 404.0, 333.0, 527.0, 162.0]
2026-01-23 03:32:14,867 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 20/100 (estimated time remaining: 21 hours, 22 minutes, 24 seconds)
2026-01-23 03:43:53,149 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 03:43:53,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:46:28,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1421.44214 ± 656.712
2026-01-23 03:46:28,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [123.08887, 2330.3328, 1723.4803, 843.63, 787.0429, 1580.2554, 2297.8938, 1384.0425, 1314.4268, 1830.2284]
2026-01-23 03:46:28,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [71.0, 728.0, 530.0, 276.0, 249.0, 498.0, 707.0, 427.0, 404.0, 570.0]
2026-01-23 03:46:28,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 21/100 (estimated time remaining: 21 hours, 14 minutes, 29 seconds)
2026-01-23 03:59:24,187 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 03:59:24,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:03:57,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2377.38672 ± 1061.581
2026-01-23 04:03:57,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [2970.8013, 3016.892, 3048.5764, 3043.0806, 1743.3059, 13.163753, 846.6792, 3048.3213, 3013.3047, 3029.7432]
2026-01-23 04:03:57,282 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 594.0, 13.0, 267.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:03:57,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 22/100 (estimated time remaining: 21 hours, 13 minutes, 35 seconds)
2026-01-23 04:15:42,994 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 04:15:42,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:20:43,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2690.05688 ± 702.475
2026-01-23 04:20:43,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3051.5996, 3100.5886, 1628.9636, 3103.5361, 3152.4116, 2548.9136, 1065.8434, 3036.9404, 3124.0442, 3087.7263]
2026-01-23 04:20:43,517 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 541.0, 1000.0, 1000.0, 781.0, 374.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:20:43,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 23/100 (estimated time remaining: 20 hours, 42 minutes, 43 seconds)
2026-01-23 04:32:37,431 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 04:32:37,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:36:29,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2089.15894 ± 983.199
2026-01-23 04:36:29,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1360.9352, 1415.7358, 923.3736, 3020.0964, 3121.2944, 993.406, 3028.9563, 3014.9426, 902.49243, 3110.3564]
2026-01-23 04:36:29,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [426.0, 425.0, 289.0, 1000.0, 1000.0, 315.0, 1000.0, 1000.0, 283.0, 1000.0]
2026-01-23 04:36:29,891 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 24/100 (estimated time remaining: 20 hours, 4 minutes, 7 seconds)
2026-01-23 04:47:55,436 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 04:47:55,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:52:07,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2277.75830 ± 975.187
2026-01-23 04:52:07,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3130.2922, 3121.1118, 3216.9028, 2703.5513, 1351.907, 3051.1082, 794.4413, 3031.2012, 1642.7045, 734.3626]
2026-01-23 04:52:07,974 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 996.0, 852.0, 416.0, 1000.0, 243.0, 1000.0, 520.0, 230.0]
2026-01-23 04:52:07,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 25/100 (estimated time remaining: 20 hours, 14 minutes, 15 seconds)
2026-01-23 05:03:46,815 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 05:03:46,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:06:50,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1650.20239 ± 1188.742
2026-01-23 05:06:50,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [2021.718, 25.64023, 991.34326, 1998.8973, 3073.1033, 3103.038, 3157.7607, 1772.2607, 25.735071, 332.52725]
2026-01-23 05:06:50,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [629.0, 23.0, 303.0, 618.0, 1000.0, 1000.0, 1000.0, 588.0, 23.0, 134.0]
2026-01-23 05:06:50,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 26/100 (estimated time remaining: 20 hours, 5 minutes, 31 seconds)
2026-01-23 05:18:33,869 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 05:18:33,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:22:39,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2247.28589 ± 1066.051
2026-01-23 05:22:39,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1263.6642, 792.40155, 955.3454, 3106.7935, 3120.667, 3125.7236, 789.68317, 3094.5613, 3120.8162, 3103.2024]
2026-01-23 05:22:39,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [382.0, 239.0, 292.0, 1000.0, 1000.0, 1000.0, 244.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:22:39,609 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 27/100 (estimated time remaining: 19 hours, 24 minutes, 50 seconds)
2026-01-23 05:34:27,817 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 05:34:27,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:36:08,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 925.49725 ± 831.830
2026-01-23 05:36:08,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1050.9318, 38.175175, 887.7809, 1087.485, 126.64113, 3095.8916, 977.9802, 73.85578, 1027.7928, 888.43854]
2026-01-23 05:36:08,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [316.0, 30.0, 263.0, 322.0, 68.0, 1000.0, 333.0, 47.0, 309.0, 268.0]
2026-01-23 05:36:08,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 28/100 (estimated time remaining: 18 hours, 20 minutes, 57 seconds)
2026-01-23 05:47:45,770 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 05:47:45,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:52:04,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2377.20166 ± 1121.555
2026-01-23 05:52:04,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3145.344, 3015.519, 3071.1494, 3128.9812, 3081.1106, 961.8767, 56.42746, 3064.6333, 1111.3855, 3135.588]
2026-01-23 05:52:04,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 918.0, 1000.0, 1000.0, 1000.0, 291.0, 42.0, 1000.0, 332.0, 1000.0]
2026-01-23 05:52:04,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 29/100 (estimated time remaining: 18 hours, 8 minutes, 11 seconds)
2026-01-23 06:04:14,806 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 06:04:14,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:09:56,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 3020.20044 ± 29.536
2026-01-23 06:09:56,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3019.1174, 3043.2766, 3024.4583, 3022.613, 3044.5208, 3030.7407, 2955.4985, 2991.0608, 3003.9866, 3066.7324]
2026-01-23 06:09:56,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:09:56,427 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1274 [INFO]: New best (3020.20) for latency DatasetOffice
2026-01-23 06:09:56,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 30/100 (estimated time remaining: 18 hours, 24 minutes, 52 seconds)
2026-01-23 06:21:46,635 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 06:21:46,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:26:05,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2398.26440 ± 825.605
2026-01-23 06:26:05,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1120.5364, 3090.362, 1404.9208, 3045.7317, 1714.0446, 3094.8728, 3052.6804, 3042.1387, 1362.9352, 3054.4211]
2026-01-23 06:26:05,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [332.0, 1000.0, 425.0, 1000.0, 507.0, 1000.0, 1000.0, 1000.0, 400.0, 1000.0]
2026-01-23 06:26:05,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 31/100 (estimated time remaining: 18 hours, 29 minutes, 28 seconds)
2026-01-23 06:37:39,869 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 06:37:39,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:41:27,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2066.86719 ± 1039.728
2026-01-23 06:41:27,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [913.38293, 3078.0378, 930.0682, 3067.7727, 916.7755, 961.60406, 3086.4448, 3120.2163, 1471.4312, 3122.9365]
2026-01-23 06:41:27,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [272.0, 1000.0, 284.0, 1000.0, 274.0, 296.0, 1000.0, 1000.0, 489.0, 956.0]
2026-01-23 06:41:27,333 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 32/100 (estimated time remaining: 18 hours, 7 minutes, 22 seconds)
2026-01-23 06:52:58,736 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 06:52:58,745 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:58:22,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2882.77344 ± 229.126
2026-01-23 06:58:22,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [2734.2024, 2974.5374, 3025.6663, 3022.0762, 3022.9326, 2962.696, 2728.542, 2284.702, 3042.8093, 3029.5703]
2026-01-23 06:58:22,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [855.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 917.0, 764.0, 1000.0, 1000.0]
2026-01-23 06:58:22,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 33/100 (estimated time remaining: 18 hours, 38 minutes, 31 seconds)
2026-01-23 07:10:56,590 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 07:10:56,595 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:13:56,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1615.60034 ± 958.575
2026-01-23 07:13:56,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3065.8254, 1144.4706, 875.36926, 867.38306, 3051.1807, 1093.5387, 1092.2036, 889.92944, 973.8922, 3102.2097]
2026-01-23 07:13:56,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 342.0, 270.0, 264.0, 1000.0, 323.0, 336.0, 273.0, 288.0, 1000.0]
2026-01-23 07:13:56,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 34/100 (estimated time remaining: 18 hours, 17 minutes, 8 seconds)
2026-01-23 07:25:37,669 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 07:25:37,671 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:30:28,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2496.95215 ± 801.737
2026-01-23 07:30:28,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1357.9802, 3041.99, 1336.2963, 3010.476, 3038.2795, 2999.7397, 3041.6108, 2988.707, 3022.2134, 1132.2283]
2026-01-23 07:30:28,065 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [399.0, 1000.0, 459.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 337.0]
2026-01-23 07:30:28,074 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 35/100 (estimated time remaining: 17 hours, 42 minutes, 57 seconds)
2026-01-23 07:42:56,866 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 07:42:56,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:46:21,589 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1779.58521 ± 1051.819
2026-01-23 07:46:21,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [220.85316, 3014.8574, 3016.2659, 1049.7124, 3050.7898, 1135.1533, 1039.4794, 1345.7542, 909.7816, 3013.2053]
2026-01-23 07:46:21,591 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [129.0, 1000.0, 1000.0, 308.0, 1000.0, 332.0, 307.0, 413.0, 292.0, 1000.0]
2026-01-23 07:46:21,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 36/100 (estimated time remaining: 17 hours, 23 minutes, 31 seconds)
2026-01-23 07:58:26,407 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 07:58:26,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:01:36,145 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1709.40955 ± 877.068
2026-01-23 08:01:36,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1233.0994, 990.88837, 1147.4103, 1636.0032, 1041.5167, 2931.0437, 3076.9285, 978.0397, 1009.87585, 3049.2893]
2026-01-23 08:01:36,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [362.0, 291.0, 340.0, 514.0, 314.0, 929.0, 1000.0, 296.0, 314.0, 1000.0]
2026-01-23 08:01:36,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 37/100 (estimated time remaining: 17 hours, 5 minutes, 52 seconds)
2026-01-23 08:13:51,035 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 08:13:51,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:16:43,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1545.05884 ± 823.160
2026-01-23 08:16:43,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1283.3583, 1209.6849, 3064.6228, 1186.3645, 1239.3221, 3076.2144, 1041.0823, 370.82785, 1289.1302, 1689.982]
2026-01-23 08:16:43,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [374.0, 368.0, 1000.0, 366.0, 366.0, 1000.0, 367.0, 145.0, 379.0, 562.0]
2026-01-23 08:16:43,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 38/100 (estimated time remaining: 16 hours, 27 minutes, 15 seconds)
2026-01-23 08:28:47,670 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 08:28:47,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:33:40,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2501.04004 ± 921.380
2026-01-23 08:33:40,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3068.0574, 3004.31, 1440.91, 3028.6829, 2996.6123, 3020.7556, 2230.6768, 3010.9917, 3028.894, 180.51013]
2026-01-23 08:33:40,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 494.0, 1000.0, 1000.0, 1000.0, 741.0, 1000.0, 1000.0, 87.0]
2026-01-23 08:33:40,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 39/100 (estimated time remaining: 16 hours, 28 minutes, 44 seconds)
2026-01-23 08:46:18,254 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 08:46:18,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:48:51,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1423.64868 ± 580.681
2026-01-23 08:48:51,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1452.4246, 2954.0112, 1424.2427, 962.34717, 1863.9028, 911.52655, 1396.2802, 1189.9369, 1117.7833, 964.0315]
2026-01-23 08:48:51,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [444.0, 904.0, 439.0, 289.0, 591.0, 268.0, 414.0, 348.0, 326.0, 291.0]
2026-01-23 08:48:51,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 40/100 (estimated time remaining: 15 hours, 56 minutes, 23 seconds)
2026-01-23 09:00:09,099 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 09:00:09,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:02:35,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1358.53259 ± 552.482
2026-01-23 09:02:35,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1116.6173, 1125.6718, 1416.8844, 850.0582, 1290.418, 664.6749, 1818.1676, 1321.0662, 1223.0353, 2758.7324]
2026-01-23 09:02:35,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [338.0, 337.0, 433.0, 252.0, 382.0, 233.0, 551.0, 382.0, 364.0, 833.0]
2026-01-23 09:02:35,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 41/100 (estimated time remaining: 15 hours, 14 minutes, 43 seconds)
2026-01-23 09:14:45,322 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 09:14:45,326 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:17:58,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1759.23608 ± 757.671
2026-01-23 09:17:58,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3133.751, 1073.18, 1180.0192, 3127.9106, 2020.4058, 2015.3273, 1158.9742, 1283.4933, 1420.1663, 1179.1332]
2026-01-23 09:17:58,403 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 324.0, 403.0, 1000.0, 619.0, 613.0, 343.0, 379.0, 440.0, 345.0]
2026-01-23 09:17:58,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 42/100 (estimated time remaining: 15 hours, 1 minute, 10 seconds)
2026-01-23 09:30:24,571 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 09:30:24,576 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:33:44,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1780.81958 ± 876.916
2026-01-23 09:33:44,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1032.2363, 902.5753, 2546.7173, 3031.3032, 1229.2198, 938.625, 2651.451, 3097.1492, 1144.0582, 1234.8584]
2026-01-23 09:33:44,594 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [308.0, 281.0, 784.0, 1000.0, 383.0, 300.0, 827.0, 1000.0, 348.0, 366.0]
2026-01-23 09:33:44,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 43/100 (estimated time remaining: 14 hours, 53 minutes, 19 seconds)
2026-01-23 09:45:51,839 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 09:45:51,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 09:51:18,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2816.10742 ± 462.135
2026-01-23 09:51:18,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1680.5562, 3033.368, 3069.7566, 3059.9563, 3044.0789, 3009.9426, 2154.8098, 3012.121, 3015.4263, 3081.058]
2026-01-23 09:51:18,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [566.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 707.0, 1000.0, 1000.0, 1000.0]
2026-01-23 09:51:18,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 44/100 (estimated time remaining: 14 hours, 44 minutes, 51 seconds)
2026-01-23 10:03:27,125 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 10:03:27,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:05:06,945 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 847.37958 ± 737.704
2026-01-23 10:05:06,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [2154.2043, 89.80976, 1759.6797, 30.369484, 25.508698, 1007.8776, 729.49335, 113.11305, 1401.7308, 1162.0088]
2026-01-23 10:05:06,946 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [700.0, 63.0, 584.0, 31.0, 31.0, 344.0, 267.0, 61.0, 432.0, 338.0]
2026-01-23 10:05:06,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 45/100 (estimated time remaining: 14 hours, 14 minutes, 3 seconds)
2026-01-23 10:17:34,335 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 10:17:34,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:21:47,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2303.46094 ± 588.994
2026-01-23 10:21:47,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1813.9927, 2611.8325, 3101.49, 2479.7473, 3127.7722, 1624.1775, 2450.2915, 2453.4983, 1176.5924, 2195.214]
2026-01-23 10:21:47,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [546.0, 800.0, 1000.0, 754.0, 1000.0, 483.0, 752.0, 755.0, 360.0, 678.0]
2026-01-23 10:21:47,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 46/100 (estimated time remaining: 14 hours, 31 minutes, 13 seconds)
2026-01-23 10:33:36,807 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 10:33:36,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:37:11,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1977.91174 ± 916.829
2026-01-23 10:37:11,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [931.8004, 3053.0732, 3266.046, 1090.9087, 2672.5146, 1845.3898, 3136.9453, 1519.2065, 797.60333, 1465.6298]
2026-01-23 10:37:11,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [280.0, 980.0, 995.0, 331.0, 835.0, 562.0, 1000.0, 470.0, 246.0, 451.0]
2026-01-23 10:37:11,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 47/100 (estimated time remaining: 14 hours, 15 minutes, 34 seconds)
2026-01-23 10:49:31,550 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 10:49:31,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 10:52:03,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1293.93250 ± 1463.901
2026-01-23 10:52:03,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3111.4438, 3070.7546, 97.28733, 191.83742, 50.344036, 90.78938, 90.585, 73.26323, 3085.139, 3077.882]
2026-01-23 10:52:03,242 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 55.0, 87.0, 30.0, 53.0, 52.0, 44.0, 1000.0, 1000.0]
2026-01-23 10:52:03,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 48/100 (estimated time remaining: 13 hours, 50 minutes, 5 seconds)
2026-01-23 11:03:46,029 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 11:03:46,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:06:13,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1372.02808 ± 525.205
2026-01-23 11:06:13,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [827.5392, 554.89624, 1890.8892, 1386.0471, 2228.1646, 1128.272, 816.1759, 1253.6642, 1897.5688, 1737.0648]
2026-01-23 11:06:13,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [248.0, 204.0, 572.0, 420.0, 665.0, 336.0, 244.0, 402.0, 583.0, 539.0]
2026-01-23 11:06:13,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 49/100 (estimated time remaining: 12 hours, 59 minutes, 14 seconds)
2026-01-23 11:19:19,491 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 11:19:19,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:24:39,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2763.79175 ± 861.551
2026-01-23 11:24:39,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3064.1484, 3048.5276, 3051.4543, 3065.4836, 3054.7778, 3060.0906, 179.40392, 3019.2878, 3048.2559, 3046.4897]
2026-01-23 11:24:39,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 88.0, 1000.0, 1000.0, 1000.0]
2026-01-23 11:24:39,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 50/100 (estimated time remaining: 13 hours, 31 minutes, 15 seconds)
2026-01-23 11:36:51,720 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 11:36:51,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:39:16,457 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1296.10168 ± 1041.024
2026-01-23 11:39:16,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1581.9868, 45.58166, 208.08156, 131.04231, 87.06555, 2863.5298, 2642.183, 1624.0037, 1654.1814, 2123.3606]
2026-01-23 11:39:16,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [522.0, 36.0, 98.0, 67.0, 51.0, 892.0, 807.0, 495.0, 501.0, 649.0]
2026-01-23 11:39:16,470 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 51/100 (estimated time remaining: 12 hours, 54 minutes, 51 seconds)
2026-01-23 11:50:32,155 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 11:50:32,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 11:55:47,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2805.22705 ± 634.176
2026-01-23 11:55:47,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3087.6711, 3083.941, 3144.9812, 1671.0201, 3114.4285, 3128.1794, 3121.7483, 3119.7222, 1414.7444, 3165.8364]
2026-01-23 11:55:47,268 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 494.0, 1000.0, 1000.0, 1000.0, 1000.0, 415.0, 1000.0]
2026-01-23 11:55:47,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 52/100 (estimated time remaining: 12 hours, 50 minutes, 13 seconds)
2026-01-23 12:08:00,385 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 12:08:00,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 12:12:17,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2274.63525 ± 1120.555
2026-01-23 12:12:17,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3155.1108, 1372.6124, 3139.09, 3113.3545, 3165.0874, 312.59113, 3079.4475, 3087.0913, 1897.8826, 424.0832]
2026-01-23 12:12:17,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 408.0, 1000.0, 1000.0, 1000.0, 134.0, 1000.0, 1000.0, 563.0, 160.0]
2026-01-23 12:12:17,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 53/100 (estimated time remaining: 12 hours, 50 minutes, 18 seconds)
2026-01-23 12:24:51,690 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 12:24:51,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 12:27:13,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1322.86267 ± 389.921
2026-01-23 12:27:13,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1270.6821, 2015.7357, 1354.0256, 1409.3643, 2049.7302, 1100.0157, 1069.2577, 1123.0496, 1017.61017, 819.1555]
2026-01-23 12:27:13,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [389.0, 607.0, 411.0, 428.0, 614.0, 331.0, 311.0, 349.0, 300.0, 277.0]
2026-01-23 12:27:13,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 54/100 (estimated time remaining: 12 hours, 41 minutes, 24 seconds)
2026-01-23 12:39:40,536 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 12:39:40,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 12:44:56,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2786.62842 ± 735.662
2026-01-23 12:44:56,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3152.6753, 3059.0967, 667.1673, 3099.0164, 3130.6392, 3096.0962, 2415.7512, 3097.2747, 3060.174, 3088.3918]
2026-01-23 12:44:56,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 239.0, 1000.0, 1000.0, 1000.0, 749.0, 1000.0, 1000.0, 1000.0]
2026-01-23 12:44:56,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 55/100 (estimated time remaining: 12 hours, 18 minutes, 43 seconds)
2026-01-23 12:56:18,840 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 12:56:18,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 13:00:45,857 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2421.23682 ± 804.006
2026-01-23 13:00:45,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1132.3418, 3106.3616, 2752.1104, 3095.0234, 3108.6626, 1080.2811, 1579.5095, 3094.1243, 2985.0466, 2278.9072]
2026-01-23 13:00:45,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [337.0, 1000.0, 842.0, 1000.0, 1000.0, 323.0, 487.0, 1000.0, 908.0, 684.0]
2026-01-23 13:00:45,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 56/100 (estimated time remaining: 12 hours, 13 minutes, 24 seconds)
2026-01-23 13:12:53,038 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 13:12:53,045 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 13:16:41,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1997.51917 ± 1254.389
2026-01-23 13:16:41,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1820.5114, 548.9269, 97.9288, 3094.5386, 3120.6736, 3152.3699, 3127.6577, 3125.196, 1788.7286, 98.66165]
2026-01-23 13:16:41,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [592.0, 190.0, 58.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 586.0, 55.0]
2026-01-23 13:16:41,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 57/100 (estimated time remaining: 11 hours, 51 minutes, 59 seconds)
2026-01-23 13:28:51,640 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 13:28:51,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 13:33:23,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2433.73242 ± 890.209
2026-01-23 13:33:23,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [2332.0608, 1223.5826, 3091.1587, 1201.2151, 3157.007, 3104.1096, 3101.3115, 3073.071, 3108.719, 945.0885]
2026-01-23 13:33:23,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [766.0, 359.0, 1000.0, 358.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 294.0]
2026-01-23 13:33:23,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 58/100 (estimated time remaining: 11 hours, 37 minutes, 23 seconds)
2026-01-23 13:45:58,490 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 13:45:58,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 13:50:34,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2403.25806 ± 1098.661
2026-01-23 13:50:34,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [680.60675, 3085.0884, 3087.9397, 104.58275, 3069.4116, 1652.1561, 3067.3726, 3080.4214, 3097.59, 3107.4124]
2026-01-23 13:50:34,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [213.0, 1000.0, 1000.0, 56.0, 1000.0, 484.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 13:50:34,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 59/100 (estimated time remaining: 11 hours, 40 minutes, 9 seconds)
2026-01-23 14:02:19,720 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 14:02:19,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 14:04:58,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1412.31079 ± 1247.818
2026-01-23 14:04:58,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1065.1925, 2007.0952, 3115.4717, 184.03664, 154.0122, 221.96585, 41.850048, 3113.128, 3095.4697, 1124.8859]
2026-01-23 14:04:58,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [316.0, 604.0, 1000.0, 96.0, 80.0, 97.0, 44.0, 1000.0, 1000.0, 334.0]
2026-01-23 14:04:58,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 60/100 (estimated time remaining: 10 hours, 56 minutes, 17 seconds)
2026-01-23 14:17:18,322 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 14:17:18,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 14:21:46,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2424.39038 ± 894.115
2026-01-23 14:21:46,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1156.7626, 885.6897, 3127.726, 1549.3848, 3159.0818, 3109.5117, 3156.369, 1888.7057, 3092.3364, 3118.3384]
2026-01-23 14:21:46,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [356.0, 261.0, 1000.0, 507.0, 1000.0, 1000.0, 1000.0, 576.0, 1000.0, 1000.0]
2026-01-23 14:21:46,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 61/100 (estimated time remaining: 10 hours, 48 minutes, 5 seconds)
2026-01-23 14:34:04,506 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 14:34:04,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 14:38:37,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2444.02832 ± 828.604
2026-01-23 14:38:37,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3120.5798, 1283.873, 2105.074, 974.93585, 1589.1505, 3116.2153, 3123.3809, 3162.2046, 2825.3928, 3139.4766]
2026-01-23 14:38:37,751 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 379.0, 627.0, 352.0, 485.0, 1000.0, 1000.0, 967.0, 887.0, 1000.0]
2026-01-23 14:38:37,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 62/100 (estimated time remaining: 10 hours, 39 minutes, 4 seconds)
2026-01-23 14:50:29,964 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 14:50:29,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 14:53:19,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1572.38196 ± 854.028
2026-01-23 14:53:19,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1381.9199, 1449.3162, 1295.6704, 2290.746, 2188.2288, 3184.5317, 1650.435, 380.74887, 92.83718, 1809.3846]
2026-01-23 14:53:19,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [411.0, 431.0, 377.0, 689.0, 674.0, 1000.0, 501.0, 150.0, 59.0, 586.0]
2026-01-23 14:53:19,914 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 63/100 (estimated time remaining: 10 hours, 7 minutes, 34 seconds)
2026-01-23 15:05:10,620 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 15:05:10,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 15:07:41,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1423.92188 ± 583.094
2026-01-23 15:07:41,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1239.8613, 1491.2396, 1068.5947, 1273.2231, 1082.8519, 1354.1812, 1062.2886, 3128.4558, 1329.9427, 1208.5797]
2026-01-23 15:07:41,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [410.0, 443.0, 311.0, 372.0, 324.0, 404.0, 310.0, 1000.0, 394.0, 355.0]
2026-01-23 15:07:41,872 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 64/100 (estimated time remaining: 9 hours, 30 minutes, 39 seconds)
2026-01-23 15:19:40,347 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 15:19:40,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 15:23:53,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2296.69214 ± 827.511
2026-01-23 15:23:53,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [2688.1333, 1404.6422, 3109.2969, 3119.9602, 3102.1, 2239.725, 2112.0825, 1009.27637, 1058.2637, 3123.4443]
2026-01-23 15:23:53,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [810.0, 408.0, 1000.0, 1000.0, 1000.0, 678.0, 641.0, 331.0, 310.0, 1000.0]
2026-01-23 15:23:53,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 65/100 (estimated time remaining: 9 hours, 28 minutes, 7 seconds)
2026-01-23 15:35:53,039 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 15:35:53,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 15:38:49,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1664.20728 ± 676.660
2026-01-23 15:38:49,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1694.6998, 1115.2931, 1149.9414, 1054.9772, 1622.5608, 2670.5537, 1811.2457, 3113.965, 1404.5778, 1004.25824]
2026-01-23 15:38:49,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [510.0, 332.0, 342.0, 314.0, 482.0, 814.0, 543.0, 1000.0, 427.0, 295.0]
2026-01-23 15:38:49,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 66/100 (estimated time remaining: 8 hours, 59 minutes, 18 seconds)
2026-01-23 15:51:09,130 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 15:51:09,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 15:55:39,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2487.83740 ± 654.882
2026-01-23 15:55:39,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3184.7166, 1459.2383, 3132.1943, 2044.4879, 2375.347, 2007.8994, 3120.0754, 3153.1047, 1536.1852, 2865.1265]
2026-01-23 15:55:39,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 436.0, 1000.0, 627.0, 722.0, 600.0, 1000.0, 1000.0, 457.0, 867.0]
2026-01-23 15:55:39,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 67/100 (estimated time remaining: 8 hours, 43 minutes, 44 seconds)
2026-01-23 16:07:49,853 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 16:07:49,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 16:11:21,098 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1954.15308 ± 696.655
2026-01-23 16:11:21,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1116.5399, 1878.8994, 3050.0012, 3111.6047, 2455.7832, 1951.0793, 2022.9158, 1109.5585, 1234.7094, 1610.4415]
2026-01-23 16:11:21,099 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [377.0, 566.0, 923.0, 1000.0, 750.0, 591.0, 607.0, 328.0, 366.0, 510.0]
2026-01-23 16:11:21,106 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 68/100 (estimated time remaining: 8 hours, 34 minutes, 55 seconds)
2026-01-23 16:24:16,303 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 16:24:16,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 16:26:02,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 915.35706 ± 1129.120
2026-01-23 16:26:02,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [395.76556, 409.66635, 2939.7249, 1896.418, 2911.5588, 171.9397, 66.47381, 38.857197, 170.71248, 152.45354]
2026-01-23 16:26:02,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [152.0, 158.0, 931.0, 578.0, 885.0, 84.0, 44.0, 31.0, 83.0, 75.0]
2026-01-23 16:26:02,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 69/100 (estimated time remaining: 8 hours, 21 minutes, 23 seconds)
2026-01-23 16:37:38,012 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 16:37:38,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 16:41:49,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2292.58813 ± 837.236
2026-01-23 16:41:49,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3112.9126, 3182.075, 1413.4766, 1846.9128, 2709.1125, 2192.0403, 3098.337, 3198.1147, 1009.8994, 1163.0024]
2026-01-23 16:41:49,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 427.0, 572.0, 851.0, 662.0, 1000.0, 1000.0, 349.0, 361.0]
2026-01-23 16:41:49,358 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 3 minutes, 11 seconds)
2026-01-23 16:53:35,064 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 16:53:35,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 16:57:53,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2368.05688 ± 828.520
2026-01-23 16:57:53,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [402.78592, 3170.8284, 2458.3123, 3163.323, 2476.0308, 3114.56, 1401.4735, 2171.5354, 2617.0532, 2704.6653]
2026-01-23 16:57:53,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [157.0, 974.0, 750.0, 1000.0, 759.0, 1000.0, 414.0, 655.0, 796.0, 831.0]
2026-01-23 16:57:53,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 71/100 (estimated time remaining: 7 hours, 54 minutes, 28 seconds)
2026-01-23 17:09:47,327 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 17:09:47,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 17:14:29,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2575.17480 ± 580.157
2026-01-23 17:14:29,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1668.2012, 1981.0939, 2175.3286, 3139.179, 3089.46, 3127.3572, 3108.1096, 1875.6829, 2423.4458, 3163.8914]
2026-01-23 17:14:29,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [489.0, 599.0, 703.0, 1000.0, 1000.0, 1000.0, 1000.0, 562.0, 733.0, 969.0]
2026-01-23 17:14:29,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 72/100 (estimated time remaining: 7 hours, 37 minutes, 15 seconds)
2026-01-23 17:27:25,187 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 17:27:25,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 17:31:35,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2297.37085 ± 763.316
2026-01-23 17:31:35,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3148.5981, 3081.9866, 1034.6079, 2754.5742, 1265.4825, 3141.326, 2073.3125, 1842.6831, 2884.7935, 1746.3444]
2026-01-23 17:31:35,320 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 939.0, 304.0, 830.0, 378.0, 1000.0, 622.0, 576.0, 881.0, 532.0]
2026-01-23 17:31:35,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 73/100 (estimated time remaining: 7 hours, 29 minutes, 19 seconds)
2026-01-23 17:43:42,326 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 17:43:42,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 17:48:12,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2438.17578 ± 913.300
2026-01-23 17:48:12,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3110.29, 3184.5771, 3118.8372, 1938.6487, 1288.227, 1887.9792, 3115.8235, 3095.0515, 528.4342, 3113.891]
2026-01-23 17:48:12,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 966.0, 1000.0, 580.0, 381.0, 575.0, 1000.0, 1000.0, 194.0, 1000.0]
2026-01-23 17:48:12,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 23 minutes, 41 seconds)
2026-01-23 17:59:28,657 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 17:59:28,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 18:02:06,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1357.97400 ± 1256.509
2026-01-23 18:02:06,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1626.3317, 404.1089, 3128.5618, 1471.4789, 3110.0532, 75.1316, 99.23788, 406.50754, 161.82864, 3096.4995]
2026-01-23 18:02:06,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [537.0, 153.0, 1000.0, 439.0, 1000.0, 45.0, 60.0, 166.0, 80.0, 1000.0]
2026-01-23 18:02:06,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 75/100 (estimated time remaining: 6 hours, 57 minutes, 29 seconds)
2026-01-23 18:14:19,159 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 18:14:19,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 18:18:59,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2587.70825 ± 706.537
2026-01-23 18:18:59,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3133.6592, 3104.1155, 2397.0122, 3138.0476, 2166.5474, 3117.9297, 1370.6373, 3162.0, 2987.3267, 1299.8076]
2026-01-23 18:18:59,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 714.0, 1000.0, 646.0, 1000.0, 396.0, 1000.0, 900.0, 379.0]
2026-01-23 18:18:59,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 45 minutes, 26 seconds)
2026-01-23 18:31:08,796 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 18:31:08,800 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 18:35:04,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2160.82300 ± 981.666
2026-01-23 18:35:04,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3175.2285, 2576.8735, 3157.119, 1005.3186, 1076.382, 2322.575, 966.0221, 3158.2185, 3196.7373, 973.7584]
2026-01-23 18:35:04,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 786.0, 1000.0, 296.0, 318.0, 696.0, 288.0, 1000.0, 1000.0, 318.0]
2026-01-23 18:35:04,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 26 minutes, 51 seconds)
2026-01-23 18:46:23,690 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 18:46:23,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 18:50:51,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2452.22314 ± 813.918
2026-01-23 18:50:51,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1608.1461, 3135.2124, 3139.333, 1069.354, 2918.4614, 1532.5736, 3137.8882, 3161.5774, 3124.4033, 1695.2819]
2026-01-23 18:50:51,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [471.0, 1000.0, 1000.0, 313.0, 892.0, 441.0, 1000.0, 1000.0, 1000.0, 552.0]
2026-01-23 18:50:51,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 4 minutes, 37 seconds)
2026-01-23 19:03:05,265 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 19:03:05,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 19:08:35,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2992.91748 ± 448.228
2026-01-23 19:08:35,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3138.6838, 1648.8201, 3146.49, 3144.165, 3141.4912, 3108.315, 3141.2722, 3148.246, 3147.0354, 3164.6565]
2026-01-23 19:08:35,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 472.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 19:08:35,462 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 79/100 (estimated time remaining: 5 hours, 53 minutes, 42 seconds)
2026-01-23 19:20:52,343 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 19:20:52,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 19:26:09,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2821.99463 ± 862.952
2026-01-23 19:26:09,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3158.9626, 3116.6685, 3109.3877, 3115.2437, 3129.0576, 3152.1604, 3150.2437, 3129.5742, 2917.9392, 240.71]
2026-01-23 19:26:09,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 938.0, 104.0]
2026-01-23 19:26:09,132 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 52 minutes, 58 seconds)
2026-01-23 19:38:11,108 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 19:38:11,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 19:40:51,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1400.77502 ± 1052.875
2026-01-23 19:40:51,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1173.9342, 3090.6104, 1490.1763, 3102.286, 2178.5107, 34.410057, 1381.6798, 518.65393, 34.470463, 1003.01874]
2026-01-23 19:40:51,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [351.0, 1000.0, 492.0, 1000.0, 712.0, 29.0, 424.0, 189.0, 40.0, 355.0]
2026-01-23 19:40:51,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 27 minutes, 30 seconds)
2026-01-23 19:52:37,370 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 19:52:37,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 19:58:24,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 3175.62524 ± 54.544
2026-01-23 19:58:24,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3122.1516, 3161.1453, 3145.903, 3296.3118, 3140.6033, 3266.4944, 3155.2812, 3144.7554, 3162.1543, 3161.4514]
2026-01-23 19:58:24,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 19:58:24,015 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1274 [INFO]: New best (3175.63) for latency DatasetOffice
2026-01-23 19:58:24,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 16 minutes, 36 seconds)
2026-01-23 20:11:07,010 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 20:11:07,013 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 20:14:02,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1706.47986 ± 799.365
2026-01-23 20:14:02,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1831.9906, 3172.244, 1147.5859, 923.54944, 1092.4933, 3031.4473, 1478.6714, 1062.5748, 1071.6462, 2252.5962]
2026-01-23 20:14:02,433 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [524.0, 1000.0, 339.0, 267.0, 315.0, 898.0, 434.0, 311.0, 314.0, 672.0]
2026-01-23 20:14:02,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 59 minutes, 27 seconds)
2026-01-23 20:25:40,547 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 20:25:40,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 20:28:09,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1328.46106 ± 1240.852
2026-01-23 20:28:09,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3113.7412, 886.9981, 223.17409, 355.78363, 51.669247, 40.46536, 3117.9424, 1179.9635, 3147.246, 1167.6271]
2026-01-23 20:28:09,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 310.0, 102.0, 136.0, 36.0, 37.0, 1000.0, 338.0, 1000.0, 343.0]
2026-01-23 20:28:09,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 30 minutes, 31 seconds)
2026-01-23 20:39:45,377 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 20:39:45,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 20:42:20,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1487.31165 ± 639.529
2026-01-23 20:42:20,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1194.1, 3160.4495, 959.9879, 2034.3944, 926.32635, 1572.6223, 1503.2867, 1235.6792, 1094.3141, 1191.9563]
2026-01-23 20:42:20,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [348.0, 1000.0, 284.0, 633.0, 274.0, 450.0, 435.0, 363.0, 318.0, 342.0]
2026-01-23 20:42:20,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 3 minutes, 47 seconds)
2026-01-23 20:54:26,022 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 20:54:26,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 20:59:17,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2618.38428 ± 785.554
2026-01-23 20:59:17,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3148.6013, 3140.6255, 3137.379, 1299.691, 3110.372, 3122.5266, 3151.2563, 3103.81, 1591.8167, 1377.7657]
2026-01-23 20:59:17,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 400.0, 1000.0, 1000.0, 1000.0, 1000.0, 472.0, 402.0]
2026-01-23 20:59:17,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 86/100 (estimated time remaining: 3 hours, 55 minutes, 16 seconds)
2026-01-23 21:10:47,840 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 21:10:47,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 21:14:45,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2131.02100 ± 1147.124
2026-01-23 21:14:45,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3156.3477, 1633.416, 275.63403, 216.4912, 3094.502, 1220.6251, 3142.8506, 3140.2854, 2347.2788, 3082.78]
2026-01-23 21:14:45,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 530.0, 128.0, 97.0, 1000.0, 380.0, 1000.0, 1000.0, 743.0, 933.0]
2026-01-23 21:14:45,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 33 minutes, 48 seconds)
2026-01-23 21:27:54,698 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 21:27:54,702 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 21:32:40,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2597.11035 ± 844.443
2026-01-23 21:32:40,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1258.4235, 1502.0208, 3122.5598, 1177.7607, 3166.8994, 3118.368, 3161.1494, 3121.5024, 3161.168, 3181.2517]
2026-01-23 21:32:40,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [419.0, 430.0, 1000.0, 344.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 21:32:40,981 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 24 minutes, 28 seconds)
2026-01-23 21:44:30,619 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 21:44:30,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 21:49:18,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2576.14526 ± 777.095
2026-01-23 21:49:18,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3124.499, 2508.5562, 3153.493, 3083.2373, 2089.944, 1656.5131, 3117.394, 3121.7493, 3113.5808, 792.4859]
2026-01-23 21:49:18,514 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 766.0, 1000.0, 1000.0, 681.0, 547.0, 1000.0, 1000.0, 1000.0, 277.0]
2026-01-23 21:49:18,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 14 minutes, 45 seconds)
2026-01-23 22:00:31,401 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 22:00:31,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 22:03:05,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1404.25073 ± 1248.505
2026-01-23 22:03:05,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3159.6875, 1567.2997, 29.930262, 347.1872, 65.884735, 36.29044, 3166.854, 3043.4805, 1325.1934, 1300.6984]
2026-01-23 22:03:05,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 525.0, 25.0, 146.0, 44.0, 31.0, 1000.0, 910.0, 402.0, 377.0]
2026-01-23 22:03:05,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 90/100 (estimated time remaining: 2 hours, 57 minutes, 39 seconds)
2026-01-23 22:15:31,663 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 22:15:31,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 22:21:07,048 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 3023.08643 ± 402.744
2026-01-23 22:21:07,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3152.125, 1815.869, 3178.8826, 3161.69, 3165.6287, 3158.8577, 3183.7905, 3137.843, 3125.476, 3150.7002]
2026-01-23 22:21:07,049 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 593.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 22:21:07,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 43 minutes, 39 seconds)
2026-01-23 22:33:14,328 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 22:33:14,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 22:38:51,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 3077.28149 ± 206.407
2026-01-23 22:38:51,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3114.547, 3157.9094, 2555.2805, 3125.9644, 3351.119, 2867.5452, 3131.4475, 3171.9443, 3182.2214, 3114.836]
2026-01-23 22:38:51,675 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 778.0, 1000.0, 1000.0, 905.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 22:38:51,684 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 31 minutes, 23 seconds)
2026-01-23 22:50:02,110 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 22:50:02,115 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 22:51:36,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 796.65637 ± 991.188
2026-01-23 22:51:36,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [1144.9077, 13.417128, 1875.6313, 62.851357, 1004.0536, 142.83328, 227.68091, 164.83888, 134.64288, 3195.7063]
2026-01-23 22:51:36,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [378.0, 16.0, 617.0, 62.0, 333.0, 80.0, 99.0, 83.0, 68.0, 1000.0]
2026-01-23 22:51:36,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 6 minutes, 17 seconds)
2026-01-23 23:04:10,758 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 23:04:10,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 23:07:20,507 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1821.91443 ± 484.068
2026-01-23 23:07:20,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [2139.5112, 1391.0555, 1455.9583, 1411.2684, 2903.1213, 1366.2444, 1670.8888, 2186.4265, 1515.2656, 2179.4036]
2026-01-23 23:07:20,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [648.0, 410.0, 429.0, 410.0, 877.0, 409.0, 492.0, 649.0, 455.0, 632.0]
2026-01-23 23:07:20,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 49 minutes, 14 seconds)
2026-01-23 23:19:46,746 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 23:19:46,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 23:23:52,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2310.20581 ± 613.116
2026-01-23 23:23:52,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [2119.0432, 2276.3032, 2154.0476, 1718.4443, 2010.8379, 3184.7236, 3147.73, 1129.0098, 2435.1316, 2926.787]
2026-01-23 23:23:52,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [634.0, 677.0, 670.0, 509.0, 584.0, 1000.0, 1000.0, 328.0, 730.0, 854.0]
2026-01-23 23:23:52,722 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 36 minutes, 56 seconds)
2026-01-23 23:35:54,640 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 23:35:54,644 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 23:39:05,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1679.13806 ± 1499.417
2026-01-23 23:39:05,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3158.9995, 3179.0144, 439.19174, 164.31892, 80.896835, 182.15115, 47.918655, 3188.4028, 3176.23, 3174.256]
2026-01-23 23:39:05,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 173.0, 92.0, 50.0, 84.0, 35.0, 1000.0, 1000.0, 1000.0]
2026-01-23 23:39:05,490 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 17 minutes, 58 seconds)
2026-01-23 23:51:22,789 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-23 23:51:22,794 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 23:57:00,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 3009.40967 ± 454.823
2026-01-23 23:57:00,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3138.8608, 1655.8728, 3125.738, 3137.5688, 3110.1777, 3137.7751, 3127.3972, 3159.3237, 3322.8052, 3178.577]
2026-01-23 23:57:00,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 541.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 988.0, 1000.0]
2026-01-23 23:57:00,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 2 minutes, 30 seconds)
2026-01-24 00:08:09,475 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 00:08:09,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 00:13:40,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2953.81128 ± 345.842
2026-01-24 00:13:40,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3075.0017, 3131.4624, 3168.0425, 2888.8765, 3151.9028, 3176.897, 2418.6833, 2168.1262, 3129.0098, 3230.112]
2026-01-24 00:13:40,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [952.0, 1000.0, 1000.0, 869.0, 1000.0, 1000.0, 782.0, 682.0, 1000.0, 1000.0]
2026-01-24 00:13:40,357 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 98/100 (estimated time remaining: 49 minutes, 14 seconds)
2026-01-24 00:26:10,414 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 00:26:10,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 00:29:06,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 1596.32104 ± 1110.078
2026-01-24 00:29:06,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [2230.3726, 73.22016, 1693.5385, 222.38794, 1598.0386, 150.4618, 1435.5453, 3177.1526, 2103.6143, 3278.8792]
2026-01-24 00:29:06,949 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [717.0, 42.0, 497.0, 96.0, 591.0, 77.0, 419.0, 1000.0, 605.0, 978.0]
2026-01-24 00:29:06,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 99/100 (estimated time remaining: 32 minutes, 42 seconds)
2026-01-24 00:41:36,077 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 00:41:36,081 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 00:45:19,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2118.27954 ± 760.492
2026-01-24 00:45:19,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [2658.0552, 2686.1707, 2018.755, 2040.3842, 3140.0723, 3254.771, 885.76733, 1807.0781, 1279.4918, 1412.248]
2026-01-24 00:45:19,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [770.0, 791.0, 593.0, 601.0, 1000.0, 1000.0, 291.0, 523.0, 375.0, 441.0]
2026-01-24 00:45:19,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1247 [INFO]: Iteration 100/100 (estimated time remaining: 16 minutes, 17 seconds)
2026-01-24 00:57:04,264 latency_env.training.mbpac:635 [DEBUG]: train() done
2026-01-24 00:57:04,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-24 01:02:06,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1269 [DEBUG]: Total Reward: 2756.08374 ± 705.804
2026-01-24 01:02:06,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1270 [DEBUG]: All rewards: [3195.974, 3154.05, 1976.9174, 3179.787, 3162.4421, 3168.4697, 998.35565, 3155.9595, 2426.1216, 3142.7622]
2026-01-24 01:02:06,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 581.0, 1000.0, 1000.0, 1000.0, 338.0, 1000.0, 714.0, 1000.0]
2026-01-24 01:02:06,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-hopper):1299 [DEBUG]: Training session finished
