2026-01-23 01:56:39,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-hopper/DatasetOffice-bpql-mda-mem1  
2026-01-23 01:56:39,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-hopper/DatasetOffice-bpql-mda-mem1  
2026-01-23 01:56:39,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x150ee92df2d0>}
2026-01-23 01:56:39,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1159 [DEBUG]: using device: cuda
2026-01-23 01:56:39,473 baseline-bpql-mda-noisy-hopper:91 [WARNING]: args.assumed_delay != args.horizon: 1 != 32
2026-01-23 01:56:39,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1181 [INFO]: Creating new trainer
2026-01-23 01:56:39,490 baseline-bpql-mda-noisy-hopper:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2026-01-23 01:56:39,490 baseline-bpql-mda-noisy-hopper:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-23 01:56:39,495 baseline-bpql-mda-noisy-hopper:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=11, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=11, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(3, 384, batch_first=True)
)
2026-01-23 01:56:40,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1242 [DEBUG]: Starting training session...
2026-01-23 01:56:40,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 1/100
2026-01-23 02:00:01,265 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:01,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 68.84326 ± 3.241
2026-01-23 02:00:01,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [65.98374, 67.70679, 73.97266, 71.345955, 64.81185, 69.263374, 66.04454, 73.95234, 65.450554, 69.900795]
2026-01-23 02:00:01,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [40.0, 41.0, 45.0, 42.0, 39.0, 42.0, 40.0, 44.0, 40.0, 42.0]
2026-01-23 02:00:01,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (68.84) for latency DatasetOffice
2026-01-23 02:00:01,827 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 32 minutes, 33 seconds)
2026-01-23 02:03:40,619 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:03:43,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 298.99500 ± 176.691
2026-01-23 02:03:43,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [367.90103, 492.87042, 304.3204, 515.08136, 51.738537, 62.017445, 360.8677, 403.42072, 19.281841, 412.45044]
2026-01-23 02:03:43,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [206.0, 322.0, 176.0, 338.0, 37.0, 42.0, 201.0, 236.0, 23.0, 256.0]
2026-01-23 02:03:43,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (298.99) for latency DatasetOffice
2026-01-23 02:03:43,139 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 45 minutes, 20 seconds)
2026-01-23 02:07:14,530 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:07:16,423 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 305.59180 ± 93.582
2026-01-23 02:07:16,423 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [339.057, 337.0506, 346.56003, 25.140633, 335.13956, 331.60394, 339.14716, 335.28537, 330.03186, 336.90182]
2026-01-23 02:07:16,423 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [157.0, 156.0, 160.0, 20.0, 153.0, 151.0, 157.0, 154.0, 151.0, 155.0]
2026-01-23 02:07:16,423 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (305.59) for latency DatasetOffice
2026-01-23 02:07:16,426 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 42 minutes, 48 seconds)
2026-01-23 02:10:47,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:10:48,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 283.22012 ± 9.434
2026-01-23 02:10:48,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [266.15637, 263.7576, 284.8792, 284.91687, 286.8442, 292.3475, 288.94846, 285.7924, 290.86447, 287.69397]
2026-01-23 02:10:48,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [124.0, 123.0, 130.0, 132.0, 130.0, 133.0, 132.0, 130.0, 131.0, 131.0]
2026-01-23 02:10:48,845 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 39 minutes, 25 seconds)
2026-01-23 02:14:22,398 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:14:24,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 314.60449 ± 6.392
2026-01-23 02:14:24,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [320.69455, 316.32706, 320.92957, 318.35553, 316.88727, 318.36285, 316.77338, 300.41644, 305.6315, 311.66684]
2026-01-23 02:14:24,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [135.0, 135.0, 135.0, 136.0, 135.0, 135.0, 135.0, 133.0, 133.0, 133.0]
2026-01-23 02:14:24,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (314.60) for latency DatasetOffice
2026-01-23 02:14:24,193 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 36 minutes, 54 seconds)
2026-01-23 02:17:54,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:17:55,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 242.05771 ± 50.891
2026-01-23 02:17:55,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [99.56722, 260.29462, 251.88824, 309.70236, 254.88986, 248.48193, 254.80646, 256.89777, 242.95018, 241.09859]
2026-01-23 02:17:55,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [56.0, 114.0, 113.0, 131.0, 112.0, 110.0, 112.0, 113.0, 108.0, 108.0]
2026-01-23 02:17:55,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 36 minutes, 32 seconds)
2026-01-23 02:21:26,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:21:28,641 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 334.26434 ± 1.950
2026-01-23 02:21:28,641 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [336.94763, 336.93265, 334.0403, 330.86713, 333.17456, 333.9984, 332.43652, 335.98016, 332.62714, 335.63876]
2026-01-23 02:21:28,641 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [146.0, 147.0, 147.0, 147.0, 147.0, 145.0, 146.0, 147.0, 145.0, 146.0]
2026-01-23 02:21:28,641 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (334.26) for latency DatasetOffice
2026-01-23 02:21:28,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 30 minutes, 18 seconds)
2026-01-23 02:25:03,142 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:25:04,954 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 307.87564 ± 5.464
2026-01-23 02:25:04,954 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [310.47665, 306.8248, 310.39, 304.6019, 321.32254, 301.98517, 303.635, 305.02203, 311.19922, 303.29877]
2026-01-23 02:25:04,954 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [141.0, 130.0, 138.0, 130.0, 147.0, 131.0, 132.0, 133.0, 143.0, 131.0]
2026-01-23 02:25:04,984 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 27 minutes, 41 seconds)
2026-01-23 02:28:34,022 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:28:35,965 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 330.63446 ± 2.510
2026-01-23 02:28:35,965 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [329.59003, 333.80844, 325.0416, 330.88907, 331.34717, 329.93286, 334.20215, 329.95444, 329.02478, 332.5543]
2026-01-23 02:28:35,965 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [151.0, 145.0, 146.0, 145.0, 147.0, 149.0, 149.0, 144.0, 147.0, 145.0]
2026-01-23 02:28:35,969 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 23 minutes, 41 seconds)
2026-01-23 02:32:09,426 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:32:10,958 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 267.69894 ± 29.167
2026-01-23 02:32:10,958 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [195.07733, 260.3834, 237.10034, 289.24686, 274.04666, 288.43933, 279.60202, 269.52487, 290.41864, 293.15018]
2026-01-23 02:32:10,958 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [88.0, 111.0, 124.0, 120.0, 115.0, 120.0, 117.0, 113.0, 121.0, 122.0]
2026-01-23 02:32:10,962 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 20 minutes, 1 second)
2026-01-23 02:35:42,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:35:44,311 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 344.41345 ± 6.571
2026-01-23 02:35:44,311 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [350.91205, 352.6484, 337.13428, 342.93173, 338.10892, 345.1409, 339.92532, 334.74417, 354.2884, 348.30023]
2026-01-23 02:35:44,311 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [159.0, 158.0, 149.0, 152.0, 151.0, 156.0, 149.0, 149.0, 151.0, 155.0]
2026-01-23 02:35:44,311 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (344.41) for latency DatasetOffice
2026-01-23 02:35:44,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 16 minutes, 58 seconds)
2026-01-23 02:39:14,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:39:16,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 326.97842 ± 4.019
2026-01-23 02:39:16,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [324.6752, 330.22757, 337.5548, 325.5668, 327.518, 326.45737, 323.9762, 324.54944, 326.2569, 323.00208]
2026-01-23 02:39:16,554 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [138.0, 140.0, 159.0, 139.0, 140.0, 138.0, 136.0, 138.0, 138.0, 135.0]
2026-01-23 02:39:16,557 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 13 minutes, 15 seconds)
2026-01-23 02:42:48,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:42:49,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 315.64743 ± 79.610
2026-01-23 02:42:49,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [330.14343, 332.32898, 361.20056, 81.339554, 373.80408, 323.94598, 346.98776, 347.00195, 323.00812, 336.71405]
2026-01-23 02:42:49,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [140.0, 136.0, 148.0, 50.0, 159.0, 139.0, 142.0, 139.0, 145.0, 137.0]
2026-01-23 02:42:49,804 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 14/100 (estimated time remaining: 5 hours, 8 minutes, 47 seconds)
2026-01-23 02:46:16,625 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:46:18,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 331.79126 ± 32.738
2026-01-23 02:46:18,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [309.48154, 379.02042, 317.23254, 297.9333, 309.1924, 298.8628, 330.90814, 314.41077, 381.0062, 379.86475]
2026-01-23 02:46:18,319 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [124.0, 143.0, 126.0, 128.0, 124.0, 127.0, 130.0, 134.0, 144.0, 143.0]
2026-01-23 02:46:18,323 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 4 minutes, 32 seconds)
2026-01-23 02:49:47,502 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:49:49,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 325.86090 ± 108.380
2026-01-23 02:49:49,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [219.93231, 425.11862, 425.09393, 197.0416, 471.99994, 437.66843, 248.08958, 203.95319, 401.28204, 228.42975]
2026-01-23 02:49:49,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [152.0, 197.0, 189.0, 136.0, 235.0, 206.0, 165.0, 140.0, 186.0, 156.0]
2026-01-23 02:49:49,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 59 minutes, 59 seconds)
2026-01-23 02:53:17,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:53:20,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 479.39761 ± 62.450
2026-01-23 02:53:20,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [486.4102, 500.09845, 506.09015, 533.2782, 528.4161, 354.34753, 359.93427, 503.47516, 514.3319, 507.5938]
2026-01-23 02:53:20,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [170.0, 172.0, 176.0, 181.0, 183.0, 141.0, 139.0, 175.0, 179.0, 178.0]
2026-01-23 02:53:20,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (479.40) for latency DatasetOffice
2026-01-23 02:53:20,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 55 minutes, 36 seconds)
2026-01-23 02:56:58,424 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:57:05,282 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1485.37817 ± 731.548
2026-01-23 02:57:05,282 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1334.0459, 1203.2646, 1047.6935, 278.59006, 1065.2637, 3041.0964, 1017.7601, 1702.7393, 2198.8096, 1964.5198]
2026-01-23 02:57:05,282 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [443.0, 419.0, 385.0, 161.0, 352.0, 998.0, 351.0, 588.0, 748.0, 653.0]
2026-01-23 02:57:05,282 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (1485.38) for latency DatasetOffice
2026-01-23 02:57:05,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 55 minutes, 40 seconds)
2026-01-23 03:00:25,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:00:27,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 566.35962 ± 160.469
2026-01-23 03:00:27,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [614.6285, 636.20795, 604.4731, 604.7604, 617.72943, 612.16614, 646.9795, 603.84125, 86.89234, 635.9173]
2026-01-23 03:00:27,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [220.0, 224.0, 217.0, 216.0, 221.0, 218.0, 225.0, 216.0, 56.0, 227.0]
2026-01-23 03:00:27,744 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 49 minutes, 10 seconds)
2026-01-23 03:04:01,965 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:04:03,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 272.35287 ± 349.857
2026-01-23 03:04:03,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [871.43134, 873.9581, 647.3085, 31.690624, 32.7757, 51.514935, 33.093678, 40.069813, 15.687336, 125.9991]
2026-01-23 03:04:03,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [309.0, 315.0, 252.0, 32.0, 36.0, 30.0, 35.0, 31.0, 16.0, 65.0]
2026-01-23 03:04:03,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 47 minutes, 34 seconds)
2026-01-23 03:07:27,532 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:07:31,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 835.19080 ± 320.839
2026-01-23 03:07:31,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [894.63336, 1045.3018, 832.24603, 852.39954, 889.91003, 534.5559, 185.00777, 1507.8097, 730.71704, 879.3275]
2026-01-23 03:07:31,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [307.0, 361.0, 280.0, 291.0, 300.0, 179.0, 92.0, 546.0, 250.0, 303.0]
2026-01-23 03:07:31,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 43 minutes, 5 seconds)
2026-01-23 03:11:04,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:11:09,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1109.94531 ± 120.464
2026-01-23 03:11:09,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1066.6926, 1322.6565, 1216.7935, 1028.9885, 1011.3392, 1309.7646, 977.6756, 1105.257, 1038.0233, 1022.2624]
2026-01-23 03:11:09,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [357.0, 436.0, 411.0, 340.0, 338.0, 442.0, 337.0, 355.0, 350.0, 353.0]
2026-01-23 03:11:09,034 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 41 minutes, 29 seconds)
2026-01-23 03:14:32,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:14:37,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 973.36102 ± 168.538
2026-01-23 03:14:37,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [853.4008, 1091.4147, 1034.4587, 1088.5927, 1015.37036, 1345.8441, 883.0361, 819.5402, 859.9278, 742.02527]
2026-01-23 03:14:37,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [283.0, 352.0, 334.0, 338.0, 325.0, 415.0, 293.0, 275.0, 289.0, 242.0]
2026-01-23 03:14:37,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 33 minutes, 26 seconds)
2026-01-23 03:18:07,816 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:18:10,888 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 680.16052 ± 187.807
2026-01-23 03:18:10,888 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [763.73, 726.7416, 727.3968, 738.017, 118.47669, 772.746, 750.16583, 736.04736, 728.28595, 739.9979]
2026-01-23 03:18:10,888 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [257.0, 246.0, 245.0, 248.0, 67.0, 263.0, 254.0, 250.0, 248.0, 250.0]
2026-01-23 03:18:10,901 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 32 minutes, 52 seconds)
2026-01-23 03:21:41,007 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:21:44,045 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 710.46478 ± 230.573
2026-01-23 03:21:44,045 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [634.3127, 874.69635, 103.01573, 871.3118, 896.566, 638.62366, 878.77136, 644.50037, 677.8399, 885.01013]
2026-01-23 03:21:44,046 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [207.0, 274.0, 56.0, 279.0, 284.0, 219.0, 270.0, 215.0, 221.0, 289.0]
2026-01-23 03:21:44,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 28 minutes, 41 seconds)
2026-01-23 03:25:12,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:25:15,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 474.11603 ± 278.682
2026-01-23 03:25:15,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [465.30246, 806.0428, 771.0977, 261.31482, 287.07336, 23.256176, 256.90268, 796.79266, 276.38943, 796.988]
2026-01-23 03:25:15,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [154.0, 257.0, 252.0, 134.0, 142.0, 20.0, 136.0, 263.0, 138.0, 243.0]
2026-01-23 03:25:15,241 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 25 minutes, 58 seconds)
2026-01-23 03:28:41,709 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:28:44,235 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 605.08362 ± 253.500
2026-01-23 03:28:44,235 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [229.96754, 825.6392, 776.83777, 691.0523, 626.97406, 10.985041, 688.59, 669.0159, 756.7918, 774.98193]
2026-01-23 03:28:44,235 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [102.0, 253.0, 238.0, 218.0, 205.0, 13.0, 220.0, 214.0, 234.0, 236.0]
2026-01-23 03:28:44,241 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 20 minutes, 17 seconds)
2026-01-23 03:32:14,405 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:32:17,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 706.34070 ± 415.204
2026-01-23 03:32:17,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [985.72125, 1063.2118, 1048.3632, 982.3495, 983.4047, 863.42944, 892.0081, 193.1005, 20.157955, 31.66001]
2026-01-23 03:32:17,437 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [308.0, 327.0, 325.0, 306.0, 311.0, 277.0, 282.0, 92.0, 22.0, 33.0]
2026-01-23 03:32:17,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 18 minutes, 2 seconds)
2026-01-23 03:35:52,001 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:35:55,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 826.96942 ± 92.121
2026-01-23 03:35:55,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [617.2911, 786.6247, 895.5476, 907.79065, 775.71484, 735.65753, 874.83276, 867.0462, 928.26935, 880.9203]
2026-01-23 03:35:55,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [213.0, 241.0, 265.0, 272.0, 263.0, 230.0, 270.0, 273.0, 278.0, 266.0]
2026-01-23 03:35:55,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 15 minutes, 27 seconds)
2026-01-23 03:39:18,908 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:39:21,975 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 747.38940 ± 67.383
2026-01-23 03:39:21,975 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [774.31213, 780.43805, 750.261, 655.21533, 756.2786, 669.45856, 666.22266, 870.2892, 722.62885, 828.78973]
2026-01-23 03:39:21,975 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [242.0, 240.0, 233.0, 198.0, 235.0, 218.0, 218.0, 260.0, 229.0, 251.0]
2026-01-23 03:39:21,980 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 10 minutes, 22 seconds)
2026-01-23 03:42:49,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:42:54,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1057.51880 ± 113.924
2026-01-23 03:42:54,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1155.6943, 1132.152, 911.2586, 1183.1012, 1135.0215, 872.43243, 1070.9358, 1157.7257, 893.5156, 1063.3507]
2026-01-23 03:42:54,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [351.0, 341.0, 284.0, 368.0, 342.0, 272.0, 325.0, 351.0, 290.0, 322.0]
2026-01-23 03:42:54,209 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 7 minutes, 5 seconds)
2026-01-23 03:46:24,212 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:46:31,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1554.74426 ± 910.764
2026-01-23 03:46:31,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [2718.8972, 2907.239, 2639.6511, 1216.8514, 918.7166, 1193.1156, 2025.9036, 31.485949, 898.70306, 996.87933]
2026-01-23 03:46:31,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [873.0, 1000.0, 855.0, 377.0, 272.0, 357.0, 704.0, 35.0, 296.0, 341.0]
2026-01-23 03:46:31,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (1554.74) for latency DatasetOffice
2026-01-23 03:46:31,221 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 5 minutes, 24 seconds)
2026-01-23 03:49:59,513 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:50:07,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1739.40271 ± 1098.234
2026-01-23 03:50:07,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [559.8979, 14.703197, 22.388248, 1863.7103, 2784.929, 2828.71, 1592.8738, 2813.2612, 2104.5276, 2809.0266]
2026-01-23 03:50:07,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [217.0, 15.0, 21.0, 623.0, 1000.0, 1000.0, 498.0, 1000.0, 671.0, 1000.0]
2026-01-23 03:50:07,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (1739.40) for latency DatasetOffice
2026-01-23 03:50:07,884 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 2 minutes, 38 seconds)
2026-01-23 03:53:37,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:53:41,634 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 917.81238 ± 39.785
2026-01-23 03:53:41,634 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [967.7102, 933.3336, 810.0443, 914.20575, 934.4754, 914.18286, 910.45776, 929.9079, 948.9697, 914.8371]
2026-01-23 03:53:41,634 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [290.0, 285.0, 247.0, 277.0, 292.0, 287.0, 277.0, 271.0, 289.0, 278.0]
2026-01-23 03:53:41,643 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 58 minutes, 8 seconds)
2026-01-23 03:57:11,466 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:57:16,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1107.49463 ± 474.657
2026-01-23 03:57:16,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [922.20953, 487.12296, 925.51, 1451.4478, 917.08405, 913.1249, 2350.2766, 937.275, 1215.6216, 955.2746]
2026-01-23 03:57:16,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [293.0, 179.0, 288.0, 438.0, 294.0, 287.0, 761.0, 288.0, 370.0, 293.0]
2026-01-23 03:57:16,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 56 minutes, 18 seconds)
2026-01-23 04:00:45,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:00:50,546 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1098.12183 ± 938.443
2026-01-23 04:00:50,546 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [2931.656, 1161.8597, 189.23222, 323.6261, 871.2559, 220.22072, 2437.5679, 1507.5475, 1325.9814, 12.271845]
2026-01-23 04:00:50,546 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 359.0, 89.0, 134.0, 280.0, 100.0, 824.0, 488.0, 417.0, 12.0]
2026-01-23 04:00:50,552 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 53 minutes, 12 seconds)
2026-01-23 04:04:23,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:04:28,233 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1241.05627 ± 271.462
2026-01-23 04:04:28,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1211.751, 1323.5762, 1988.4628, 1059.8563, 1189.3833, 1196.9584, 1210.6417, 1141.2389, 1194.1926, 894.501]
2026-01-23 04:04:28,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [359.0, 398.0, 623.0, 318.0, 358.0, 352.0, 362.0, 334.0, 368.0, 279.0]
2026-01-23 04:04:28,238 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 49 minutes, 45 seconds)
2026-01-23 04:07:56,476 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:08:06,078 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2082.24023 ± 741.421
2026-01-23 04:08:06,078 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [629.2476, 1450.3512, 2827.7773, 1892.7263, 2876.6865, 2852.598, 3001.4158, 1874.1055, 1747.7529, 1669.7416]
2026-01-23 04:08:06,078 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [243.0, 440.0, 1000.0, 610.0, 1000.0, 1000.0, 982.0, 681.0, 635.0, 575.0]
2026-01-23 04:08:06,078 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (2082.24) for latency DatasetOffice
2026-01-23 04:08:06,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 46 minutes, 25 seconds)
2026-01-23 04:11:37,430 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:11:40,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 843.84680 ± 490.358
2026-01-23 04:11:40,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [838.0049, 694.0711, 582.0444, 2127.0002, 890.2414, 864.4107, 837.6965, 640.59015, 57.046345, 907.3614]
2026-01-23 04:11:40,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [269.0, 226.0, 204.0, 691.0, 272.0, 262.0, 257.0, 225.0, 47.0, 296.0]
2026-01-23 04:11:41,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 43 minutes, 4 seconds)
2026-01-23 04:15:05,003 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:15:18,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2696.95166 ± 476.471
2026-01-23 04:15:18,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [2992.0686, 2853.7512, 2830.049, 2843.7344, 2822.723, 2825.9795, 2848.9612, 1274.4863, 2834.536, 2843.2256]
2026-01-23 04:15:18,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 462.0, 1000.0, 1000.0]
2026-01-23 04:15:18,096 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1274 [INFO]: New best (2696.95) for latency DatasetOffice
2026-01-23 04:15:18,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 40 minutes)
2026-01-23 04:18:51,808 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:18:55,650 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 927.94189 ± 485.258
2026-01-23 04:18:55,650 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1986.4585, 336.95926, 44.139053, 1135.5428, 1109.2385, 939.29443, 898.9284, 847.71173, 1070.2972, 910.8497]
2026-01-23 04:18:55,650 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [643.0, 136.0, 38.0, 336.0, 327.0, 277.0, 281.0, 252.0, 319.0, 285.0]
2026-01-23 04:18:55,655 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 37 minutes, 1 second)
2026-01-23 04:22:22,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:22:29,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1591.38464 ± 754.712
2026-01-23 04:22:29,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1229.7855, 927.7648, 2867.6304, 1228.1345, 940.3073, 890.5053, 1239.5703, 2615.274, 1293.521, 2681.3523]
2026-01-23 04:22:29,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [378.0, 289.0, 1000.0, 360.0, 284.0, 265.0, 363.0, 834.0, 390.0, 927.0]
2026-01-23 04:22:29,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 32 minutes, 34 seconds)
2026-01-23 04:25:57,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:26:01,992 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1084.90369 ± 489.126
2026-01-23 04:26:01,992 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1064.1205, 6.8124776, 1381.9374, 1200.5992, 1919.2653, 983.6021, 902.90875, 638.08276, 1276.805, 1474.903]
2026-01-23 04:26:01,992 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [322.0, 8.0, 444.0, 360.0, 577.0, 300.0, 265.0, 230.0, 385.0, 444.0]
2026-01-23 04:26:01,997 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 28 minutes)
2026-01-23 04:29:31,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:29:35,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 875.08905 ± 38.856
2026-01-23 04:29:35,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [858.579, 888.891, 902.5267, 922.2905, 879.96594, 866.1257, 811.9134, 935.95953, 813.11725, 871.52167]
2026-01-23 04:29:35,213 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [253.0, 262.0, 263.0, 270.0, 258.0, 253.0, 243.0, 273.0, 243.0, 255.0]
2026-01-23 04:29:35,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 24 minutes, 6 seconds)
2026-01-23 04:33:02,471 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:33:06,515 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 975.84259 ± 115.652
2026-01-23 04:33:06,515 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1176.9078, 984.80206, 1119.1023, 1105.5657, 917.8081, 911.5807, 895.83594, 889.35266, 967.2109, 790.2602]
2026-01-23 04:33:06,515 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [359.0, 306.0, 349.0, 340.0, 288.0, 289.0, 284.0, 275.0, 303.0, 272.0]
2026-01-23 04:33:06,520 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 19 minutes, 26 seconds)
2026-01-23 04:36:36,143 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:36:41,588 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1325.56201 ± 718.419
2026-01-23 04:36:41,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1445.0128, 1157.5592, 1742.0605, 1097.7192, 1091.2007, 1163.4143, 3110.706, 969.90393, 102.284615, 1375.7594]
2026-01-23 04:36:41,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [438.0, 355.0, 536.0, 340.0, 332.0, 356.0, 1000.0, 341.0, 55.0, 416.0]
2026-01-23 04:36:41,594 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 15 minutes, 25 seconds)
2026-01-23 04:40:08,684 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:40:11,553 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 693.55396 ± 221.168
2026-01-23 04:40:11,553 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [755.6696, 739.215, 39.72038, 728.1848, 820.9837, 833.6629, 747.6433, 786.50146, 707.41785, 776.5404]
2026-01-23 04:40:11,553 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [227.0, 240.0, 38.0, 236.0, 245.0, 251.0, 225.0, 251.0, 235.0, 231.0]
2026-01-23 04:40:11,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 11 minutes, 14 seconds)
2026-01-23 04:43:43,943 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:43:46,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 725.51361 ± 85.746
2026-01-23 04:43:46,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [643.4044, 645.155, 777.45325, 649.45026, 646.3908, 857.871, 650.5841, 726.96796, 851.8986, 805.96124]
2026-01-23 04:43:46,848 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [207.0, 208.0, 234.0, 210.0, 208.0, 254.0, 209.0, 224.0, 250.0, 240.0]
2026-01-23 04:43:46,856 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 8 minutes, 7 seconds)
2026-01-23 04:47:15,082 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:47:22,891 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1748.53613 ± 1069.577
2026-01-23 04:47:22,891 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [2376.6694, 3079.9563, 370.77222, 2985.925, 1726.0472, 1241.2046, 3012.211, 714.4489, 49.476387, 1928.6501]
2026-01-23 04:47:22,891 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [777.0, 1000.0, 147.0, 1000.0, 528.0, 382.0, 1000.0, 245.0, 30.0, 587.0]
2026-01-23 04:47:22,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 5 minutes, 3 seconds)
2026-01-23 04:50:56,721 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:51:00,576 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 954.79846 ± 266.807
2026-01-23 04:51:00,576 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [928.4412, 981.70776, 1078.8322, 940.05707, 1055.2147, 1367.406, 948.8991, 1012.2434, 240.9228, 994.26025]
2026-01-23 04:51:00,576 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [281.0, 299.0, 329.0, 285.0, 315.0, 405.0, 293.0, 306.0, 106.0, 300.0]
2026-01-23 04:51:00,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 2 minutes, 35 seconds)
2026-01-23 04:54:27,280 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:54:31,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1004.78699 ± 51.547
2026-01-23 04:54:31,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [968.532, 954.50073, 1056.1669, 986.28516, 988.7443, 995.77423, 1134.3907, 959.9152, 1018.79034, 984.7705]
2026-01-23 04:54:31,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [304.0, 297.0, 321.0, 304.0, 312.0, 310.0, 347.0, 302.0, 316.0, 306.0]
2026-01-23 04:54:31,414 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 58 minutes, 18 seconds)
2026-01-23 04:57:56,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:58:04,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1763.27637 ± 856.462
2026-01-23 04:58:04,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [3147.626, 2770.6982, 1717.9829, 1755.9624, 1168.8448, 2056.0066, 758.74426, 124.35796, 1861.2574, 2271.2827]
2026-01-23 04:58:04,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 872.0, 523.0, 546.0, 358.0, 652.0, 269.0, 76.0, 570.0, 700.0]
2026-01-23 04:58:04,456 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 55 minutes, 14 seconds)
2026-01-23 05:01:36,643 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:01:39,507 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 677.31158 ± 462.433
2026-01-23 05:01:39,507 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [991.00104, 973.80597, 1144.723, 1180.3848, 998.6066, 975.46783, 353.4952, 12.588986, 9.168474, 133.87341]
2026-01-23 05:01:39,507 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [293.0, 307.0, 348.0, 360.0, 303.0, 298.0, 140.0, 16.0, 10.0, 100.0]
2026-01-23 05:01:39,515 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 51 minutes, 37 seconds)
2026-01-23 05:05:08,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:05:15,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1700.26489 ± 664.781
2026-01-23 05:05:15,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1283.7208, 2131.7893, 1339.4672, 1107.2476, 2544.1482, 842.3609, 3027.439, 1276.8429, 1418.4259, 2031.2069]
2026-01-23 05:05:15,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [392.0, 666.0, 477.0, 327.0, 808.0, 317.0, 961.0, 398.0, 439.0, 609.0]
2026-01-23 05:05:15,296 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 48 minutes)
2026-01-23 05:08:41,454 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:08:44,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 811.95038 ± 256.375
2026-01-23 05:08:44,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1226.7128, 848.9218, 667.0086, 839.72375, 797.36414, 739.633, 179.25029, 979.48334, 968.8244, 872.5819]
2026-01-23 05:08:44,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [370.0, 252.0, 213.0, 247.0, 240.0, 227.0, 86.0, 317.0, 288.0, 256.0]
2026-01-23 05:08:44,671 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 43 minutes, 9 seconds)
2026-01-23 05:12:22,046 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:12:27,717 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1261.23218 ± 882.266
2026-01-23 05:12:27,717 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [2468.6228, 251.71031, 349.29865, 252.89731, 2399.2454, 1747.3331, 2428.0962, 1163.1891, 982.2776, 569.65173]
2026-01-23 05:12:27,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [810.0, 112.0, 152.0, 110.0, 747.0, 562.0, 756.0, 350.0, 370.0, 227.0]
2026-01-23 05:12:27,726 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 41 minutes, 26 seconds)
2026-01-23 05:15:48,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:15:50,418 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 500.07950 ± 309.005
2026-01-23 05:15:50,418 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [650.0324, 791.13367, 673.8096, 801.5238, 804.6578, 743.3077, 283.95456, 50.90812, 40.5585, 160.90862]
2026-01-23 05:15:50,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [208.0, 237.0, 215.0, 241.0, 240.0, 228.0, 114.0, 34.0, 24.0, 98.0]
2026-01-23 05:15:50,427 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 36 minutes, 20 seconds)
2026-01-23 05:19:22,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:19:25,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 796.77136 ± 136.683
2026-01-23 05:19:25,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [891.28046, 950.43677, 743.5416, 883.52484, 912.6855, 638.9645, 824.6902, 733.63983, 494.61877, 894.3317]
2026-01-23 05:19:25,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [261.0, 297.0, 226.0, 258.0, 266.0, 211.0, 244.0, 225.0, 173.0, 262.0]
2026-01-23 05:19:26,005 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 32 minutes, 51 seconds)
2026-01-23 05:22:54,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:22:57,389 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 784.68713 ± 52.513
2026-01-23 05:22:57,390 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [827.9594, 711.56757, 797.6403, 855.20715, 772.89825, 860.5862, 797.2281, 729.835, 787.6332, 706.3166]
2026-01-23 05:22:57,390 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [246.0, 222.0, 238.0, 254.0, 234.0, 252.0, 239.0, 226.0, 238.0, 220.0]
2026-01-23 05:22:57,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 28 minutes, 41 seconds)
2026-01-23 05:26:29,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:26:32,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 750.96936 ± 60.842
2026-01-23 05:26:32,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [756.6466, 787.9085, 697.5306, 742.0354, 784.77594, 777.0201, 596.8789, 752.8912, 827.4504, 786.5565]
2026-01-23 05:26:32,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [230.0, 238.0, 221.0, 228.0, 238.0, 235.0, 202.0, 230.0, 249.0, 237.0]
2026-01-23 05:26:32,176 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 25 minutes, 53 seconds)
2026-01-23 05:29:53,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:29:58,491 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1153.59497 ± 267.770
2026-01-23 05:29:58,491 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1079.1147, 943.8078, 931.463, 1171.9308, 1218.2222, 1160.6346, 951.278, 1857.2516, 1303.3129, 918.93365]
2026-01-23 05:29:58,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [328.0, 284.0, 271.0, 356.0, 367.0, 345.0, 284.0, 567.0, 391.0, 269.0]
2026-01-23 05:29:58,501 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 20 minutes, 6 seconds)
2026-01-23 05:33:30,656 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:33:33,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 715.24670 ± 285.519
2026-01-23 05:33:33,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [818.08685, 913.27167, 188.20248, 819.6423, 915.135, 800.4328, 908.7917, 833.4842, 113.53611, 841.88367]
2026-01-23 05:33:33,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [243.0, 267.0, 89.0, 241.0, 266.0, 239.0, 266.0, 245.0, 61.0, 250.0]
2026-01-23 05:33:33,495 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 18 minutes, 11 seconds)
2026-01-23 05:37:02,723 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:37:05,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 709.10608 ± 158.712
2026-01-23 05:37:05,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [753.28455, 766.11755, 644.29193, 305.7526, 882.59033, 634.0564, 799.73804, 793.8318, 645.39154, 866.0063]
2026-01-23 05:37:05,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [229.0, 232.0, 208.0, 128.0, 263.0, 204.0, 238.0, 236.0, 208.0, 253.0]
2026-01-23 05:37:05,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 14 minutes, 12 seconds)
2026-01-23 05:40:35,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:40:39,795 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1033.64722 ± 275.596
2026-01-23 05:40:39,796 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [864.5479, 909.65497, 972.1957, 1782.6578, 983.26013, 747.6097, 951.934, 911.52386, 975.78546, 1237.3019]
2026-01-23 05:40:39,796 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [256.0, 269.0, 294.0, 527.0, 294.0, 241.0, 283.0, 269.0, 287.0, 368.0]
2026-01-23 05:40:39,803 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 11 minutes, 1 second)
2026-01-23 05:44:06,414 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:44:09,211 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 705.00714 ± 351.607
2026-01-23 05:44:09,211 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [885.6222, 904.969, 813.1308, 952.61646, 880.662, 984.223, 978.1573, 574.84393, 25.766672, 50.079945]
2026-01-23 05:44:09,211 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [261.0, 266.0, 243.0, 276.0, 258.0, 290.0, 285.0, 194.0, 24.0, 32.0]
2026-01-23 05:44:09,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 6 minutes, 50 seconds)
2026-01-23 05:47:35,737 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:47:42,459 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1621.97656 ± 478.984
2026-01-23 05:47:42,459 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1672.0574, 948.504, 1207.3451, 1731.7543, 1632.4816, 950.8214, 2031.6349, 2546.9243, 1477.1167, 2021.1252]
2026-01-23 05:47:42,459 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [509.0, 307.0, 370.0, 523.0, 496.0, 325.0, 623.0, 792.0, 457.0, 647.0]
2026-01-23 05:47:42,468 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 4 minutes, 7 seconds)
2026-01-23 05:51:12,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:51:15,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 833.13702 ± 194.541
2026-01-23 05:51:15,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [965.4203, 916.8931, 899.40234, 966.7905, 959.4727, 408.4789, 898.8337, 862.3438, 958.86383, 494.871]
2026-01-23 05:51:15,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [282.0, 269.0, 264.0, 287.0, 282.0, 154.0, 265.0, 253.0, 281.0, 176.0]
2026-01-23 05:51:15,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 24 seconds)
2026-01-23 05:54:44,526 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:54:47,919 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 861.18066 ± 170.566
2026-01-23 05:54:47,919 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [976.7754, 973.0912, 975.72015, 979.1437, 439.9403, 878.2177, 969.87604, 653.48016, 835.9527, 929.609]
2026-01-23 05:54:47,919 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [289.0, 289.0, 289.0, 289.0, 165.0, 279.0, 291.0, 206.0, 275.0, 274.0]
2026-01-23 05:54:47,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 56 minutes, 51 seconds)
2026-01-23 05:58:18,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:58:21,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 788.86194 ± 385.989
2026-01-23 05:58:21,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [967.0373, 992.3719, 9.954727, 906.81836, 1016.33276, 1061.7576, 1037.9365, 978.8117, 879.4465, 38.15297]
2026-01-23 05:58:21,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [289.0, 289.0, 11.0, 264.0, 299.0, 309.0, 304.0, 291.0, 272.0, 39.0]
2026-01-23 05:58:21,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 53 minutes, 12 seconds)
2026-01-23 06:01:49,581 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:01:52,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 637.81482 ± 394.823
2026-01-23 06:01:52,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [158.96727, 938.3499, 954.05096, 553.938, 18.508486, 13.697908, 974.11743, 883.7878, 916.439, 966.29083]
2026-01-23 06:01:52,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [83.0, 275.0, 280.0, 195.0, 15.0, 14.0, 297.0, 261.0, 280.0, 297.0]
2026-01-23 06:01:52,199 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 49 minutes, 50 seconds)
2026-01-23 06:05:23,413 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:05:27,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 954.99286 ± 37.408
2026-01-23 06:05:27,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [935.54645, 964.28394, 858.09467, 978.70325, 966.19653, 980.92755, 985.3379, 974.14764, 926.08307, 980.6076]
2026-01-23 06:05:27,146 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [272.0, 284.0, 253.0, 293.0, 296.0, 291.0, 292.0, 290.0, 271.0, 288.0]
2026-01-23 06:05:27,153 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 46 minutes, 28 seconds)
2026-01-23 06:08:53,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:08:57,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 952.13849 ± 196.650
2026-01-23 06:08:57,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1064.4462, 1188.2893, 971.79065, 945.82996, 950.87305, 964.1929, 424.47968, 896.7082, 1141.3802, 973.3942]
2026-01-23 06:08:57,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [320.0, 344.0, 294.0, 294.0, 295.0, 297.0, 166.0, 279.0, 332.0, 291.0]
2026-01-23 06:08:57,177 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 42 minutes, 34 seconds)
2026-01-23 06:12:22,520 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:12:23,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 135.01807 ± 204.266
2026-01-23 06:12:23,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [198.57663, 48.94838, 10.524577, 85.5738, 52.177383, 33.132153, 91.554634, 24.489836, 76.02585, 729.1773]
2026-01-23 06:12:23,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [94.0, 32.0, 16.0, 67.0, 41.0, 24.0, 58.0, 21.0, 86.0, 220.0]
2026-01-23 06:12:23,389 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 38 minutes, 30 seconds)
2026-01-23 06:15:54,518 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:15:57,649 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 794.65796 ± 76.249
2026-01-23 06:15:57,650 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [829.2122, 693.26733, 824.7006, 637.6922, 883.0454, 846.41156, 841.1835, 723.6527, 819.9695, 847.44495]
2026-01-23 06:15:57,650 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [249.0, 220.0, 247.0, 211.0, 259.0, 255.0, 251.0, 229.0, 248.0, 253.0]
2026-01-23 06:15:57,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 35 minutes, 5 seconds)
2026-01-23 06:19:28,977 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:19:32,655 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 954.35614 ± 32.109
2026-01-23 06:19:32,655 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [963.29364, 968.72076, 886.32245, 946.92084, 976.49054, 989.86707, 964.4296, 975.777, 901.6282, 970.11163]
2026-01-23 06:19:32,656 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [288.0, 296.0, 262.0, 275.0, 290.0, 299.0, 282.0, 284.0, 265.0, 293.0]
2026-01-23 06:19:32,662 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 31 minutes, 54 seconds)
2026-01-23 06:22:59,778 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:23:03,305 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 936.08838 ± 34.493
2026-01-23 06:23:03,305 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [910.2004, 985.28625, 934.26624, 984.4402, 901.39777, 982.6635, 909.49426, 947.96136, 902.2311, 902.9426]
2026-01-23 06:23:03,305 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [263.0, 287.0, 271.0, 286.0, 263.0, 285.0, 263.0, 272.0, 263.0, 262.0]
2026-01-23 06:23:03,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 28 minutes)
2026-01-23 06:26:32,851 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:26:36,115 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 840.62531 ± 278.347
2026-01-23 06:26:36,115 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [883.2416, 961.6428, 883.34, 902.9915, 965.62024, 975.7003, 934.2639, 886.48047, 14.267905, 998.704]
2026-01-23 06:26:36,115 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [257.0, 279.0, 258.0, 264.0, 282.0, 289.0, 272.0, 259.0, 13.0, 310.0]
2026-01-23 06:26:36,123 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 24 minutes, 42 seconds)
2026-01-23 06:30:01,962 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:30:04,791 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 708.41284 ± 309.844
2026-01-23 06:30:04,791 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [10.012692, 847.8504, 661.86536, 859.135, 868.4252, 1078.6732, 788.8983, 247.97066, 874.9599, 846.3376]
2026-01-23 06:30:04,792 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [10.0, 250.0, 214.0, 255.0, 255.0, 328.0, 238.0, 104.0, 257.0, 251.0]
2026-01-23 06:30:04,801 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 21 minutes, 22 seconds)
2026-01-23 06:33:34,220 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:33:44,843 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2420.74487 ± 911.670
2026-01-23 06:33:44,843 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1526.3705, 2956.5913, 2974.6418, 3005.8884, 2980.4495, 663.1057, 2976.0112, 3068.23, 994.4703, 3061.6892]
2026-01-23 06:33:44,843 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [521.0, 1000.0, 1000.0, 1000.0, 993.0, 199.0, 1000.0, 1000.0, 299.0, 1000.0]
2026-01-23 06:33:44,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 18 minutes, 15 seconds)
2026-01-23 06:37:18,678 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:37:21,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 836.66620 ± 277.535
2026-01-23 06:37:21,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [967.94415, 979.24817, 976.68506, 900.35565, 20.614426, 896.9983, 870.4715, 971.3825, 975.59357, 807.36835]
2026-01-23 06:37:21,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [282.0, 296.0, 296.0, 270.0, 21.0, 276.0, 258.0, 288.0, 285.0, 243.0]
2026-01-23 06:37:21,913 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 14 minutes, 50 seconds)
2026-01-23 06:40:48,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:40:52,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 970.69739 ± 36.514
2026-01-23 06:40:52,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [991.094, 978.1167, 977.0845, 989.7811, 980.5409, 862.199, 983.91486, 974.675, 985.27997, 984.2879]
2026-01-23 06:40:52,081 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [296.0, 289.0, 286.0, 291.0, 292.0, 259.0, 287.0, 291.0, 289.0, 297.0]
2026-01-23 06:40:52,111 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 11 minutes, 15 seconds)
2026-01-23 06:44:16,460 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:44:19,587 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 791.03430 ± 55.355
2026-01-23 06:44:19,587 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [815.53723, 829.4985, 771.35425, 789.50964, 846.6737, 854.36945, 663.5652, 743.2308, 763.4708, 833.13403]
2026-01-23 06:44:19,587 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [243.0, 245.0, 235.0, 238.0, 250.0, 252.0, 214.0, 228.0, 233.0, 248.0]
2026-01-23 06:44:19,595 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 7 minutes, 21 seconds)
2026-01-23 06:47:50,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:47:53,890 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 928.11243 ± 81.209
2026-01-23 06:47:53,891 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [866.0397, 837.2646, 739.27997, 974.89056, 990.9267, 946.4174, 980.83746, 983.5664, 969.1758, 992.7259]
2026-01-23 06:47:53,891 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [257.0, 258.0, 240.0, 287.0, 300.0, 278.0, 292.0, 300.0, 288.0, 298.0]
2026-01-23 06:47:53,901 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 4 minutes, 8 seconds)
2026-01-23 06:51:24,664 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:51:28,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 919.45770 ± 137.049
2026-01-23 06:51:28,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [940.8426, 984.90375, 917.6889, 964.9464, 973.8202, 986.6714, 991.3783, 977.5102, 513.9351, 942.8801]
2026-01-23 06:51:28,286 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [274.0, 288.0, 270.0, 283.0, 283.0, 296.0, 298.0, 284.0, 181.0, 287.0]
2026-01-23 06:51:28,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 15 seconds)
2026-01-23 06:54:55,245 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:54:58,811 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 943.93036 ± 80.107
2026-01-23 06:54:58,811 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [945.98285, 940.16693, 774.535, 947.7281, 953.4101, 1114.7142, 900.39813, 963.57367, 905.5442, 993.2506]
2026-01-23 06:54:58,812 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [275.0, 273.0, 236.0, 275.0, 277.0, 330.0, 265.0, 280.0, 264.0, 296.0]
2026-01-23 06:54:58,823 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 85/100 (estimated time remaining: 56 minutes, 22 seconds)
2026-01-23 06:58:30,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:58:33,753 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 939.91699 ± 52.945
2026-01-23 06:58:33,753 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [985.3814, 918.40674, 975.06726, 952.4555, 939.3917, 801.5452, 904.91736, 962.66156, 985.1116, 974.2317]
2026-01-23 06:58:33,753 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [288.0, 269.0, 283.0, 279.0, 274.0, 240.0, 267.0, 282.0, 294.0, 285.0]
2026-01-23 06:58:33,763 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 86/100 (estimated time remaining: 53 minutes, 4 seconds)
2026-01-23 07:01:58,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:02:02,599 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 997.37939 ± 55.113
2026-01-23 07:02:02,599 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [958.7567, 970.7319, 993.3268, 1122.6742, 951.9639, 1064.3883, 1010.7177, 974.73785, 925.2698, 1001.2265]
2026-01-23 07:02:02,599 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [279.0, 284.0, 294.0, 328.0, 281.0, 310.0, 294.0, 287.0, 270.0, 292.0]
2026-01-23 07:02:02,607 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 87/100 (estimated time remaining: 49 minutes, 36 seconds)
2026-01-23 07:05:30,581 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:05:40,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 2208.89307 ± 1218.259
2026-01-23 07:05:40,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [75.123116, 3084.9595, 1610.3428, 9.722399, 1736.1216, 3091.066, 3112.1113, 3121.402, 3149.1152, 3098.9668]
2026-01-23 07:05:40,191 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [45.0, 1000.0, 485.0, 10.0, 524.0, 1000.0, 1000.0, 1000.0, 961.0, 1000.0]
2026-01-23 07:05:40,201 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 88/100 (estimated time remaining: 46 minutes, 12 seconds)
2026-01-23 07:09:13,668 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:09:17,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1097.30310 ± 178.646
2026-01-23 07:09:17,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1203.9642, 997.92676, 916.9068, 1254.3943, 943.7799, 920.12494, 907.2723, 1120.4518, 1446.754, 1261.4557]
2026-01-23 07:09:17,907 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [347.0, 292.0, 268.0, 373.0, 276.0, 268.0, 271.0, 325.0, 422.0, 372.0]
2026-01-23 07:09:17,916 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 89/100 (estimated time remaining: 42 minutes, 47 seconds)
2026-01-23 07:12:46,507 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:12:50,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 925.93274 ± 52.407
2026-01-23 07:12:50,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [840.09033, 923.18427, 933.9862, 943.71246, 967.4804, 897.1281, 971.8611, 994.1234, 829.316, 958.4442]
2026-01-23 07:12:50,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [252.0, 272.0, 273.0, 276.0, 282.0, 265.0, 285.0, 299.0, 251.0, 281.0]
2026-01-23 07:12:50,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 90/100 (estimated time remaining: 39 minutes, 16 seconds)
2026-01-23 07:16:16,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:16:20,348 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 963.48682 ± 53.913
2026-01-23 07:16:20,349 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [954.835, 973.51843, 971.9236, 838.63995, 1000.183, 1005.06445, 987.7296, 955.89777, 905.3119, 1041.7648]
2026-01-23 07:16:20,349 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [276.0, 282.0, 282.0, 255.0, 299.0, 292.0, 292.0, 279.0, 266.0, 306.0]
2026-01-23 07:16:20,359 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 91/100 (estimated time remaining: 35 minutes, 33 seconds)
2026-01-23 07:19:50,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:19:56,501 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1561.67615 ± 896.852
2026-01-23 07:19:56,501 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1166.7562, 1150.849, 3198.612, 2967.8193, 1440.527, 980.1439, 2061.1245, 1272.0344, 35.946167, 1342.9487]
2026-01-23 07:19:56,501 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [349.0, 333.0, 1000.0, 932.0, 433.0, 298.0, 601.0, 379.0, 35.0, 428.0]
2026-01-23 07:19:56,512 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 92/100 (estimated time remaining: 32 minutes, 13 seconds)
2026-01-23 07:23:25,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:23:29,621 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 928.81921 ± 236.514
2026-01-23 07:23:29,621 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [940.46, 308.24667, 945.44086, 826.63947, 1051.8197, 921.3598, 979.4987, 1017.70483, 1004.90686, 1292.116]
2026-01-23 07:23:29,621 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [274.0, 122.0, 289.0, 249.0, 310.0, 272.0, 291.0, 301.0, 298.0, 376.0]
2026-01-23 07:23:29,631 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 93/100 (estimated time remaining: 28 minutes, 31 seconds)
2026-01-23 07:27:00,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:27:06,364 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1545.01721 ± 792.853
2026-01-23 07:27:06,365 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1400.8958, 980.22327, 2921.3901, 624.3739, 3186.6833, 1211.8871, 1557.103, 1108.9834, 1163.1213, 1295.5104]
2026-01-23 07:27:06,365 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [406.0, 295.0, 880.0, 217.0, 1000.0, 376.0, 470.0, 326.0, 349.0, 384.0]
2026-01-23 07:27:06,375 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 94/100 (estimated time remaining: 24 minutes, 55 seconds)
2026-01-23 07:30:34,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:30:39,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1122.84119 ± 641.106
2026-01-23 07:30:39,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1170.0299, 636.0528, 1093.7222, 1436.9855, 1613.3306, 506.27267, 53.207146, 2536.0962, 1030.8398, 1151.8755]
2026-01-23 07:30:39,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [344.0, 221.0, 329.0, 431.0, 486.0, 186.0, 33.0, 757.0, 309.0, 339.0]
2026-01-23 07:30:39,078 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 95/100 (estimated time remaining: 21 minutes, 22 seconds)
2026-01-23 07:34:06,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:34:10,970 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1210.37659 ± 469.785
2026-01-23 07:34:10,970 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1187.0305, 1219.2444, 1169.1735, 2535.933, 972.07947, 889.3104, 1000.4164, 969.9187, 804.6872, 1355.9716]
2026-01-23 07:34:10,970 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [366.0, 364.0, 353.0, 754.0, 303.0, 264.0, 293.0, 295.0, 246.0, 419.0]
2026-01-23 07:34:10,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 96/100 (estimated time remaining: 17 minutes, 50 seconds)
2026-01-23 07:37:42,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:37:46,610 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1069.50989 ± 108.124
2026-01-23 07:37:46,610 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [971.2811, 971.82666, 1189.8015, 1194.3907, 1015.70276, 972.08356, 974.30273, 1186.7946, 1230.3961, 988.5188]
2026-01-23 07:37:46,610 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [291.0, 284.0, 342.0, 346.0, 313.0, 297.0, 294.0, 349.0, 358.0, 297.0]
2026-01-23 07:37:46,620 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 97/100 (estimated time remaining: 14 minutes, 16 seconds)
2026-01-23 07:41:10,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:41:16,306 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1316.94238 ± 679.070
2026-01-23 07:41:16,306 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1063.7362, 1229.9075, 3080.7307, 941.49194, 993.33136, 756.87213, 1008.16, 1006.9197, 2080.1714, 1008.103]
2026-01-23 07:41:16,306 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [321.0, 372.0, 924.0, 277.0, 294.0, 238.0, 308.0, 298.0, 626.0, 307.0]
2026-01-23 07:41:16,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 98/100 (estimated time remaining: 10 minutes, 40 seconds)
2026-01-23 07:44:48,133 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:44:51,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 742.49023 ± 586.080
2026-01-23 07:44:51,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [997.13855, 1080.4218, 1134.7363, 1121.2354, 1435.001, 1473.7736, 23.928099, 48.491577, 79.50361, 30.672495]
2026-01-23 07:44:51,122 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [298.0, 331.0, 333.0, 328.0, 420.0, 435.0, 20.0, 54.0, 66.0, 28.0]
2026-01-23 07:44:51,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 99/100 (estimated time remaining: 7 minutes, 5 seconds)
2026-01-23 07:48:16,177 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:48:20,707 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 1133.10327 ± 303.460
2026-01-23 07:48:20,707 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1277.2445, 1166.2957, 761.3776, 667.0011, 1260.8915, 992.5306, 996.4837, 1728.2716, 1476.6222, 1004.31366]
2026-01-23 07:48:20,707 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [390.0, 352.0, 249.0, 221.0, 388.0, 292.0, 301.0, 512.0, 438.0, 308.0]
2026-01-23 07:48:20,716 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1247 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 32 seconds)
2026-01-23 07:51:51,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:51:54,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1269 [DEBUG]: Total Reward: 880.67102 ± 279.825
2026-01-23 07:51:54,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1270 [DEBUG]: All rewards: [1475.2538, 845.9917, 978.39496, 763.8513, 933.0228, 865.81433, 955.5066, 251.25325, 875.254, 862.36694]
2026-01-23 07:51:54,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [454.0, 255.0, 290.0, 240.0, 274.0, 256.0, 286.0, 107.0, 258.0, 256.0]
2026-01-23 07:51:54,702 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-hopper):1299 [DEBUG]: Training session finished
