2026-01-23 01:59:33,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-walker2d/DatasetOffice-bpql-mda-mem5 
2026-01-23 01:59:33,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-walker2d/DatasetOffice-bpql-mda-mem5 
2026-01-23 01:59:33,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x149f53afed90>}
2026-01-23 01:59:33,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1159 [DEBUG]: using device: cuda
2026-01-23 01:59:33,389 baseline-bpql-mda-noisy-walker2d:91 [WARNING]: args.assumed_delay != args.horizon: 5 != 32
2026-01-23 01:59:33,389 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1181 [INFO]: Creating new trainer
2026-01-23 01:59:33,406 baseline-bpql-mda-noisy-walker2d:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2026-01-23 01:59:33,406 baseline-bpql-mda-noisy-walker2d:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-23 01:59:33,412 baseline-bpql-mda-noisy-walker2d:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(6, 384, batch_first=True)
)
2026-01-23 01:59:34,289 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1242 [DEBUG]: Starting training session...
2026-01-23 01:59:34,289 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 1/100
2026-01-23 02:02:58,704 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:03:01,014 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 225.50032 ± 60.626
2026-01-23 02:03:01,015 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [188.29776, 235.47516, 293.5502, 181.5869, 161.88762, 331.27496, 265.57263, 174.57143, 144.76039, 278.0262]
2026-01-23 02:03:01,015 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [115.0, 137.0, 180.0, 111.0, 90.0, 227.0, 160.0, 96.0, 344.0, 211.0]
2026-01-23 02:03:01,015 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (225.50) for latency DatasetOffice
2026-01-23 02:03:01,018 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 41 minutes, 6 seconds)
2026-01-23 02:06:37,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:06:41,642 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 284.96417 ± 81.879
2026-01-23 02:06:41,642 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [221.7274, 306.99445, 480.83862, 223.95926, 333.24026, 268.4738, 315.48935, 166.92302, 299.76852, 232.22675]
2026-01-23 02:06:41,642 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [130.0, 230.0, 676.0, 151.0, 236.0, 160.0, 525.0, 267.0, 203.0, 308.0]
2026-01-23 02:06:41,642 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (284.96) for latency DatasetOffice
2026-01-23 02:06:41,647 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 49 minutes)
2026-01-23 02:10:20,465 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:10:23,001 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 281.01495 ± 78.981
2026-01-23 02:10:23,001 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [237.33026, 268.06686, 310.84723, 270.03018, 324.8127, 320.08322, 338.50833, 289.8783, 377.3734, 73.21906]
2026-01-23 02:10:23,001 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [116.0, 133.0, 178.0, 153.0, 198.0, 205.0, 180.0, 166.0, 232.0, 270.0]
2026-01-23 02:10:23,006 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 49 minutes, 35 seconds)
2026-01-23 02:14:04,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:14:11,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 371.19470 ± 240.753
2026-01-23 02:14:11,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [169.19179, 73.978004, 115.15553, 746.31744, 369.02637, 742.84753, 244.96309, 251.41635, 645.90283, 353.14795]
2026-01-23 02:14:11,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [328.0, 202.0, 291.0, 1000.0, 206.0, 1000.0, 116.0, 485.0, 1000.0, 222.0]
2026-01-23 02:14:11,250 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (371.19) for latency DatasetOffice
2026-01-23 02:14:11,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 50 minutes, 47 seconds)
2026-01-23 02:17:41,168 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:17:43,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 381.51199 ± 63.402
2026-01-23 02:17:43,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [303.48526, 401.48724, 396.35165, 535.1007, 403.29672, 364.5406, 377.19693, 386.16785, 358.56662, 288.92627]
2026-01-23 02:17:43,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [152.0, 206.0, 214.0, 363.0, 192.0, 190.0, 197.0, 196.0, 167.0, 136.0]
2026-01-23 02:17:43,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (381.51) for latency DatasetOffice
2026-01-23 02:17:43,954 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 45 minutes, 3 seconds)
2026-01-23 02:21:19,633 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:21:22,251 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 271.73026 ± 102.218
2026-01-23 02:21:22,251 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [323.9309, 327.7761, 333.14877, 298.83264, 40.498974, 283.19272, 280.60748, 345.87637, 370.4358, 113.00274]
2026-01-23 02:21:22,251 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [178.0, 155.0, 251.0, 157.0, 254.0, 166.0, 142.0, 187.0, 182.0, 232.0]
2026-01-23 02:21:22,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 45 minutes, 3 seconds)
2026-01-23 02:24:59,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:25:01,938 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 343.11945 ± 109.523
2026-01-23 02:25:01,938 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [287.21307, 428.31595, 346.7879, 435.02176, 264.08194, 302.024, 322.84067, 596.8702, 225.15077, 222.88786]
2026-01-23 02:25:01,938 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [134.0, 201.0, 174.0, 204.0, 126.0, 149.0, 163.0, 373.0, 125.0, 116.0]
2026-01-23 02:25:01,941 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 41 minutes, 5 seconds)
2026-01-23 02:28:35,384 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:28:38,015 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 374.59937 ± 74.013
2026-01-23 02:28:38,015 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [383.4908, 436.99072, 232.92754, 397.35812, 390.5311, 444.4908, 367.10495, 271.78647, 487.64078, 333.67215]
2026-01-23 02:28:38,015 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [193.0, 206.0, 136.0, 194.0, 180.0, 202.0, 162.0, 145.0, 311.0, 159.0]
2026-01-23 02:28:38,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 35 minutes, 48 seconds)
2026-01-23 02:32:15,368 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:32:17,628 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 277.26691 ± 47.791
2026-01-23 02:32:17,628 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [236.66649, 259.69177, 183.71529, 286.71332, 378.48972, 278.54593, 256.05765, 302.46262, 289.3174, 301.00906]
2026-01-23 02:32:17,628 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [168.0, 156.0, 110.0, 154.0, 224.0, 159.0, 158.0, 172.0, 186.0, 171.0]
2026-01-23 02:32:17,634 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 29 minutes, 32 seconds)
2026-01-23 02:35:54,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:35:58,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 441.33038 ± 80.519
2026-01-23 02:35:58,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [390.54553, 460.17923, 370.15903, 640.54675, 351.0552, 506.6062, 431.15756, 455.0298, 432.81802, 375.20648]
2026-01-23 02:35:58,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [329.0, 293.0, 219.0, 370.0, 211.0, 284.0, 244.0, 240.0, 299.0, 197.0]
2026-01-23 02:35:58,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (441.33) for latency DatasetOffice
2026-01-23 02:35:58,072 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 28 minutes, 14 seconds)
2026-01-23 02:39:32,595 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:39:35,193 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 320.74106 ± 42.133
2026-01-23 02:39:35,193 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [330.14288, 380.62198, 289.124, 302.0, 334.19086, 260.86157, 282.23218, 289.65225, 399.26532, 339.31967]
2026-01-23 02:39:35,193 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [176.0, 269.0, 150.0, 163.0, 184.0, 161.0, 183.0, 170.0, 271.0, 178.0]
2026-01-23 02:39:35,198 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 24 minutes, 14 seconds)
2026-01-23 02:43:11,096 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:43:13,539 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 336.15836 ± 107.166
2026-01-23 02:43:13,539 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [267.77524, 269.83508, 235.40628, 350.64703, 354.0781, 619.5506, 412.73438, 274.86325, 305.45428, 271.23892]
2026-01-23 02:43:13,539 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [136.0, 146.0, 133.0, 186.0, 176.0, 276.0, 250.0, 159.0, 158.0, 140.0]
2026-01-23 02:43:13,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 20 minutes, 12 seconds)
2026-01-23 02:46:47,998 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:46:51,026 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 432.87695 ± 195.520
2026-01-23 02:46:51,026 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [673.535, 345.53085, 523.58453, 518.5618, 640.4493, 329.6246, 683.8509, 98.857285, 332.4483, 182.32706]
2026-01-23 02:46:51,026 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [224.0, 174.0, 345.0, 263.0, 329.0, 158.0, 274.0, 121.0, 207.0, 117.0]
2026-01-23 02:46:51,032 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 14/100 (estimated time remaining: 5 hours, 16 minutes, 58 seconds)
2026-01-23 02:50:29,625 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:50:31,803 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 273.17880 ± 112.246
2026-01-23 02:50:31,803 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [481.54306, 397.153, 184.30891, 176.22005, 281.80164, 268.2807, 406.94766, 222.30615, 123.812744, 189.4143]
2026-01-23 02:50:31,803 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [273.0, 181.0, 112.0, 125.0, 167.0, 174.0, 207.0, 134.0, 106.0, 108.0]
2026-01-23 02:50:31,808 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 13 minutes, 39 seconds)
2026-01-23 02:54:09,812 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:54:13,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 528.25574 ± 156.437
2026-01-23 02:54:13,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [280.88596, 514.0752, 330.03198, 679.6576, 518.39215, 569.2058, 381.18536, 590.24445, 828.215, 590.6635]
2026-01-23 02:54:13,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [178.0, 278.0, 187.0, 266.0, 285.0, 247.0, 187.0, 278.0, 539.0, 304.0]
2026-01-23 02:54:13,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (528.26) for latency DatasetOffice
2026-01-23 02:54:13,633 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 10 minutes, 24 seconds)
2026-01-23 02:57:45,346 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:57:48,108 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 391.34457 ± 170.968
2026-01-23 02:57:48,108 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [273.18457, 205.35301, 563.0239, 339.45462, 396.53113, 552.6044, 343.65146, 235.19786, 761.36035, 243.08446]
2026-01-23 02:57:48,108 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [143.0, 118.0, 247.0, 211.0, 215.0, 257.0, 188.0, 143.0, 326.0, 137.0]
2026-01-23 02:57:48,112 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 6 minutes)
2026-01-23 03:01:27,253 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:01:30,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 469.56015 ± 194.526
2026-01-23 03:01:30,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [874.736, 315.26642, 388.46634, 208.55614, 424.53574, 311.8861, 703.9686, 634.5111, 440.58545, 393.08984]
2026-01-23 03:01:30,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [395.0, 163.0, 176.0, 143.0, 220.0, 176.0, 328.0, 400.0, 182.0, 220.0]
2026-01-23 03:01:30,588 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 3 minutes, 30 seconds)
2026-01-23 03:05:03,439 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:05:06,509 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 497.98160 ± 70.282
2026-01-23 03:05:06,509 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [588.89105, 540.2828, 480.2293, 467.4274, 429.99686, 436.52765, 462.31183, 655.974, 444.33243, 473.84238]
2026-01-23 03:05:06,509 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [271.0, 267.0, 207.0, 199.0, 182.0, 219.0, 206.0, 242.0, 198.0, 208.0]
2026-01-23 03:05:06,514 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 59 minutes, 25 seconds)
2026-01-23 03:08:44,407 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:08:46,846 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 232.34457 ± 290.842
2026-01-23 03:08:46,846 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [106.61313, 159.64003, 1006.378, 70.6586, 43.14429, 115.09443, 107.50583, 172.51228, 18.127068, 523.772]
2026-01-23 03:08:46,846 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [201.0, 187.0, 379.0, 121.0, 53.0, 117.0, 133.0, 285.0, 25.0, 283.0]
2026-01-23 03:08:46,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 55 minutes, 39 seconds)
2026-01-23 03:12:21,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:12:23,952 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 521.46277 ± 136.392
2026-01-23 03:12:23,952 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [432.08752, 677.19745, 387.19687, 665.2437, 721.0894, 384.3342, 650.63086, 395.9671, 533.8337, 367.0468]
2026-01-23 03:12:23,952 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [175.0, 279.0, 187.0, 267.0, 253.0, 168.0, 234.0, 170.0, 207.0, 162.0]
2026-01-23 03:12:23,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 50 minutes, 45 seconds)
2026-01-23 03:16:00,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:16:03,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 442.62372 ± 166.057
2026-01-23 03:16:03,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [128.38712, 768.60406, 360.89453, 355.0871, 526.7385, 434.39798, 482.2369, 333.10046, 628.4122, 408.3784]
2026-01-23 03:16:03,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [119.0, 307.0, 159.0, 199.0, 216.0, 201.0, 196.0, 149.0, 259.0, 184.0]
2026-01-23 03:16:03,702 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 48 minutes, 30 seconds)
2026-01-23 03:19:44,662 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:19:51,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1085.91626 ± 386.621
2026-01-23 03:19:51,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1177.906, 1574.2958, 702.9375, 840.90216, 783.25024, 613.6642, 1896.915, 1270.1757, 1099.2067, 899.90924]
2026-01-23 03:19:51,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [499.0, 824.0, 303.0, 308.0, 314.0, 234.0, 827.0, 531.0, 475.0, 442.0]
2026-01-23 03:19:51,345 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (1085.92) for latency DatasetOffice
2026-01-23 03:19:51,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 46 minutes, 11 seconds)
2026-01-23 03:23:23,246 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:23:27,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 768.08270 ± 363.165
2026-01-23 03:23:27,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1594.5505, 661.26465, 1204.3191, 896.0144, 320.2568, 470.8166, 447.00558, 705.62415, 760.5307, 620.44464]
2026-01-23 03:23:27,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [585.0, 271.0, 476.0, 319.0, 141.0, 241.0, 218.0, 272.0, 302.0, 235.0]
2026-01-23 03:23:27,537 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 42 minutes, 35 seconds)
2026-01-23 03:27:07,403 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:27:14,714 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1375.58130 ± 328.021
2026-01-23 03:27:14,714 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1705.4073, 1498.6675, 807.9233, 1281.9956, 1707.4863, 1251.8921, 929.664, 1345.0778, 1915.7351, 1311.9637]
2026-01-23 03:27:14,714 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [648.0, 566.0, 350.0, 481.0, 647.0, 455.0, 354.0, 488.0, 748.0, 467.0]
2026-01-23 03:27:14,714 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (1375.58) for latency DatasetOffice
2026-01-23 03:27:14,720 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 40 minutes, 39 seconds)
2026-01-23 03:30:51,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:30:58,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1204.96509 ± 626.770
2026-01-23 03:30:58,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1451.8269, 900.6193, 2674.7888, 1138.2101, 1927.032, 645.39825, 1135.3304, 957.4014, 598.1458, 620.8976]
2026-01-23 03:30:58,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [593.0, 355.0, 1000.0, 504.0, 703.0, 275.0, 463.0, 390.0, 255.0, 265.0]
2026-01-23 03:30:58,065 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 38 minutes, 31 seconds)
2026-01-23 03:34:37,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:34:40,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 530.56641 ± 461.897
2026-01-23 03:34:40,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [766.2443, 543.3305, 1233.1783, 1198.8334, 758.49207, 728.84015, 17.898338, 31.429007, 17.136461, 10.281824]
2026-01-23 03:34:40,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [282.0, 211.0, 432.0, 441.0, 303.0, 272.0, 25.0, 35.0, 24.0, 18.0]
2026-01-23 03:34:40,024 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 35 minutes, 21 seconds)
2026-01-23 03:38:11,957 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:38:21,088 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1971.23865 ± 849.531
2026-01-23 03:38:21,089 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3239.5076, 1488.2236, 2057.5918, 2665.803, 1947.5055, 3413.9248, 771.11523, 928.22955, 1600.8162, 1599.6696]
2026-01-23 03:38:21,089 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [887.0, 476.0, 631.0, 831.0, 663.0, 1000.0, 328.0, 319.0, 621.0, 530.0]
2026-01-23 03:38:21,089 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (1971.24) for latency DatasetOffice
2026-01-23 03:38:21,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 30 minutes, 2 seconds)
2026-01-23 03:41:57,572 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:42:06,552 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2091.32397 ± 858.847
2026-01-23 03:42:06,552 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1214.5498, 3384.7334, 1607.4828, 1341.4156, 2508.666, 2729.0015, 3011.6948, 2497.2124, 496.1825, 2122.3035]
2026-01-23 03:42:06,552 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [408.0, 1000.0, 503.0, 404.0, 659.0, 796.0, 805.0, 739.0, 200.0, 661.0]
2026-01-23 03:42:06,552 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (2091.32) for latency DatasetOffice
2026-01-23 03:42:06,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 28 minutes, 33 seconds)
2026-01-23 03:45:48,502 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:45:55,352 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1610.19116 ± 1020.859
2026-01-23 03:45:55,353 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3557.921, 1938.9009, 1476.6353, 3338.4453, 1202.1077, 87.18909, 1044.6998, 1114.9205, 973.81116, 1367.2806]
2026-01-23 03:45:55,353 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 510.0, 456.0, 947.0, 368.0, 78.0, 339.0, 366.0, 344.0, 407.0]
2026-01-23 03:45:55,359 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 25 minutes, 13 seconds)
2026-01-23 03:49:27,501 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:49:34,422 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1701.68774 ± 1044.549
2026-01-23 03:49:34,422 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1204.2595, 1331.015, 2237.9602, 3968.3281, 1665.1907, 582.414, 1658.6119, 2998.971, 902.7635, 467.36392]
2026-01-23 03:49:34,422 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [351.0, 375.0, 638.0, 1000.0, 502.0, 225.0, 463.0, 750.0, 281.0, 202.0]
2026-01-23 03:49:34,427 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 20 minutes, 29 seconds)
2026-01-23 03:53:20,049 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:53:28,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2009.90356 ± 1022.905
2026-01-23 03:53:28,437 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1244.8992, 1390.2627, 3035.4514, 2304.6409, 682.94745, 665.6144, 3666.8987, 2806.0115, 1324.024, 2978.2842]
2026-01-23 03:53:28,437 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [381.0, 381.0, 784.0, 630.0, 243.0, 285.0, 1000.0, 785.0, 408.0, 928.0]
2026-01-23 03:53:28,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 19 minutes, 32 seconds)
2026-01-23 03:57:00,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 03:57:10,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2776.09082 ± 983.739
2026-01-23 03:57:10,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4120.1006, 2742.0493, 3243.3987, 1871.652, 3792.452, 1765.0864, 3993.933, 2574.556, 974.4189, 2683.2625]
2026-01-23 03:57:10,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 705.0, 823.0, 506.0, 1000.0, 474.0, 1000.0, 650.0, 305.0, 630.0]
2026-01-23 03:57:10,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (2776.09) for latency DatasetOffice
2026-01-23 03:57:10,610 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 16 minutes, 1 second)
2026-01-23 04:00:52,806 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:00:59,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1746.13965 ± 1349.491
2026-01-23 04:00:59,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3396.601, 38.803364, 2327.5432, 1090.4126, 26.359951, 945.7278, 3421.8079, 1779.9253, 3790.8477, 643.36743]
2026-01-23 04:00:59,690 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [865.0, 61.0, 623.0, 370.0, 32.0, 315.0, 873.0, 503.0, 972.0, 227.0]
2026-01-23 04:00:59,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 13 minutes, 4 seconds)
2026-01-23 04:04:49,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:04:59,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2508.65918 ± 1374.416
2026-01-23 04:04:59,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1589.4081, 4025.2373, 3907.3342, 667.0075, 1947.1359, 3123.0576, 2422.3904, 4294.9795, 3085.8938, 24.147228]
2026-01-23 04:04:59,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [452.0, 1000.0, 1000.0, 232.0, 503.0, 816.0, 734.0, 1000.0, 782.0, 35.0]
2026-01-23 04:04:59,320 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 11 minutes, 40 seconds)
2026-01-23 04:08:27,654 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:08:36,470 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2308.15576 ± 1463.449
2026-01-23 04:08:36,471 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3989.983, 228.90147, 2581.368, 4132.8916, 2091.5706, 3039.0083, 4167.9346, 1620.6711, 1167.8044, 61.42286]
2026-01-23 04:08:36,471 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 208.0, 643.0, 1000.0, 535.0, 763.0, 1000.0, 447.0, 378.0, 141.0]
2026-01-23 04:08:36,478 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 7 minutes, 26 seconds)
2026-01-23 04:12:20,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:12:31,050 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2848.15283 ± 1318.937
2026-01-23 04:12:31,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1997.5183, 764.57874, 4184.6504, 3978.0286, 3720.1472, 1431.9462, 1007.5373, 4037.9216, 4077.2175, 3281.9832]
2026-01-23 04:12:31,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [549.0, 249.0, 1000.0, 1000.0, 1000.0, 422.0, 294.0, 1000.0, 1000.0, 837.0]
2026-01-23 04:12:31,051 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (2848.15) for latency DatasetOffice
2026-01-23 04:12:31,057 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 3 minutes, 45 seconds)
2026-01-23 04:16:11,748 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:16:22,434 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2948.63672 ± 1475.911
2026-01-23 04:16:22,434 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1159.2, 1189.317, 4369.6777, 4082.2239, 4197.018, 3651.2148, 4036.172, 4506.9795, 1266.5948, 1027.9697]
2026-01-23 04:16:22,434 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [347.0, 374.0, 1000.0, 1000.0, 1000.0, 935.0, 1000.0, 1000.0, 350.0, 323.0]
2026-01-23 04:16:22,434 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (2948.64) for latency DatasetOffice
2026-01-23 04:16:22,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 1 minute, 53 seconds)
2026-01-23 04:19:42,278 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:19:51,186 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2430.97705 ± 1310.977
2026-01-23 04:19:51,186 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3823.8496, 4247.616, 2257.9792, 848.69464, 3071.0408, 4274.3345, 689.0433, 1552.2084, 1090.0844, 2454.92]
2026-01-23 04:19:51,186 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [935.0, 1000.0, 624.0, 260.0, 788.0, 976.0, 242.0, 447.0, 352.0, 625.0]
2026-01-23 04:19:51,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 53 minutes, 50 seconds)
2026-01-23 04:23:44,253 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:23:56,048 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3449.22510 ± 1029.324
2026-01-23 04:23:56,048 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4408.62, 4338.5635, 2529.113, 4132.76, 4165.2285, 1390.4872, 2733.8486, 4230.3086, 4187.4243, 2375.896]
2026-01-23 04:23:56,048 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 624.0, 1000.0, 1000.0, 396.0, 623.0, 1000.0, 1000.0, 590.0]
2026-01-23 04:23:56,048 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (3449.23) for latency DatasetOffice
2026-01-23 04:23:56,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 51 minutes, 8 seconds)
2026-01-23 04:27:21,665 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:27:34,088 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3645.60303 ± 1001.128
2026-01-23 04:27:34,089 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [2552.8, 3984.3142, 4208.627, 3989.0203, 4458.134, 2614.5684, 4286.714, 1448.395, 4436.042, 4477.4194]
2026-01-23 04:27:34,089 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [627.0, 1000.0, 1000.0, 1000.0, 1000.0, 682.0, 1000.0, 410.0, 1000.0, 1000.0]
2026-01-23 04:27:34,089 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (3645.60) for latency DatasetOffice
2026-01-23 04:27:34,101 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 47 minutes, 31 seconds)
2026-01-23 04:31:20,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:31:30,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3193.82690 ± 1539.454
2026-01-23 04:31:30,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4042.5188, 211.34387, 2617.985, 388.2999, 3674.1018, 4471.9565, 4369.6987, 4308.1006, 3528.9617, 4325.3047]
2026-01-23 04:31:30,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 117.0, 628.0, 161.0, 843.0, 1000.0, 1000.0, 1000.0, 819.0, 1000.0]
2026-01-23 04:31:30,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 44 minutes, 9 seconds)
2026-01-23 04:35:05,590 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:35:19,981 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4583.43408 ± 138.025
2026-01-23 04:35:19,981 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4434.718, 4627.258, 4560.106, 4523.396, 4571.6123, 4802.8203, 4300.447, 4752.475, 4652.174, 4609.331]
2026-01-23 04:35:19,981 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:35:19,981 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (4583.43) for latency DatasetOffice
2026-01-23 04:35:19,988 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 39 minutes, 55 seconds)
2026-01-23 04:38:54,402 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:39:05,739 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3459.55981 ± 1357.643
2026-01-23 04:39:05,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4274.7075, 2700.0002, 4630.9907, 2172.6646, 4353.358, 152.8485, 4357.162, 4192.3154, 4485.556, 3275.9956]
2026-01-23 04:39:05,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 645.0, 1000.0, 554.0, 1000.0, 98.0, 1000.0, 976.0, 1000.0, 769.0]
2026-01-23 04:39:05,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 39 minutes, 21 seconds)
2026-01-23 04:42:33,794 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:42:40,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 1905.50366 ± 1882.390
2026-01-23 04:42:40,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3814.2058, 4071.3557, 3517.1714, 3421.4312, 41.778774, 27.149492, 13.405632, 69.66836, 12.558734, 4066.312]
2026-01-23 04:42:40,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 880.0, 50.0, 29.0, 20.0, 55.0, 20.0, 1000.0]
2026-01-23 04:42:41,007 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 29 minutes, 59 seconds)
2026-01-23 04:46:13,941 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:46:27,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3589.97461 ± 1108.754
2026-01-23 04:46:27,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [3790.7659, 4085.8647, 4125.619, 4137.8125, 3888.939, 3987.9329, 284.67874, 3817.8794, 3977.3733, 3802.8823]
2026-01-23 04:46:27,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 134.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:46:27,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 27 minutes, 42 seconds)
2026-01-23 04:49:53,186 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:50:06,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4007.18481 ± 617.836
2026-01-23 04:50:06,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4233.9653, 4254.086, 4200.8276, 4035.4287, 4087.4216, 4270.5625, 2169.9883, 4269.6655, 4235.345, 4314.5576]
2026-01-23 04:50:06,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 580.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:50:06,964 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 20 minutes, 53 seconds)
2026-01-23 04:53:47,538 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:53:57,432 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2972.67114 ± 1878.363
2026-01-23 04:53:57,432 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4365.9805, 4208.079, 4304.452, 2734.0327, -1.2352103, 609.0518, 35.165413, 4532.3804, 4393.6045, 4545.1987]
2026-01-23 04:53:57,432 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 655.0, 11.0, 229.0, 48.0, 980.0, 1000.0, 1000.0]
2026-01-23 04:53:57,438 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 17 minutes, 24 seconds)
2026-01-23 04:57:37,314 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 04:57:51,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4333.76123 ± 105.801
2026-01-23 04:57:51,530 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4332.621, 4406.2056, 4083.006, 4297.069, 4494.1216, 4351.165, 4400.591, 4390.6685, 4341.575, 4240.5913]
2026-01-23 04:57:51,530 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 931.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 04:57:51,538 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 15 minutes, 8 seconds)
2026-01-23 05:01:26,029 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:01:37,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3853.90942 ± 1467.149
2026-01-23 05:01:37,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4389.6636, 2664.8271, 4817.4756, 4736.711, 4653.542, 2854.2078, 4675.5015, 4882.231, 4745.931, 119.00213]
2026-01-23 05:01:37,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 608.0, 1000.0, 1000.0, 1000.0, 684.0, 1000.0, 1000.0, 1000.0, 75.0]
2026-01-23 05:01:37,958 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 13 minutes, 16 seconds)
2026-01-23 05:05:12,269 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:05:23,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3292.06836 ± 1424.510
2026-01-23 05:05:23,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [2372.1667, 2623.1116, 23.87739, 1930.9003, 4314.368, 4380.6567, 4311.0767, 4299.0786, 4422.774, 4242.6753]
2026-01-23 05:05:23,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [597.0, 660.0, 36.0, 536.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:05:23,429 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 9 minutes, 23 seconds)
2026-01-23 05:08:59,158 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:09:13,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4519.83203 ± 356.501
2026-01-23 05:09:13,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4626.707, 4529.607, 4754.6836, 3487.9607, 4725.1763, 4706.383, 4541.521, 4673.7026, 4702.8037, 4449.7725]
2026-01-23 05:09:13,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 782.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:09:13,097 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 7 minutes, 12 seconds)
2026-01-23 05:12:34,788 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:12:47,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4309.77246 ± 1176.076
2026-01-23 05:12:47,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [797.817, 4725.789, 4703.301, 4722.8154, 4705.4233, 4852.257, 4580.2554, 4810.0586, 4767.9907, 4432.0166]
2026-01-23 05:12:47,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [249.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:12:47,753 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 51 seconds)
2026-01-23 05:16:22,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:16:35,973 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4038.00732 ± 794.619
2026-01-23 05:16:35,973 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [2097.585, 4438.154, 4542.2017, 2888.8523, 4490.69, 4388.695, 4395.0864, 4455.3057, 4343.114, 4340.39]
2026-01-23 05:16:35,973 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [511.0, 1000.0, 1000.0, 694.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:16:35,980 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 56 minutes, 9 seconds)
2026-01-23 05:20:09,637 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:20:22,171 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4254.19629 ± 1241.648
2026-01-23 05:20:22,171 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4800.375, 4829.2715, 4741.087, 4967.1094, 4828.9956, 2633.711, 1105.9075, 4878.2495, 4827.6426, 4929.6147]
2026-01-23 05:20:22,171 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 604.0, 304.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:20:22,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 52 minutes, 22 seconds)
2026-01-23 05:23:59,379 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:24:06,625 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2284.82861 ± 2178.806
2026-01-23 05:24:06,625 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [28.022133, -0.350127, 10.133235, 509.82928, 35.10277, 4580.939, 4543.64, 4663.299, 4012.033, 4465.6367]
2026-01-23 05:24:06,625 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [30.0, 11.0, 18.0, 194.0, 36.0, 1000.0, 1000.0, 1000.0, 880.0, 1000.0]
2026-01-23 05:24:06,633 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 48 minutes, 28 seconds)
2026-01-23 05:27:39,352 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:27:52,915 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4608.13916 ± 699.570
2026-01-23 05:27:52,915 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4835.6113, 2523.3013, 4722.898, 4987.5376, 4792.5137, 4865.811, 4898.215, 4842.3657, 4705.4824, 4907.6567]
2026-01-23 05:27:52,915 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 559.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:27:52,915 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (4608.14) for latency DatasetOffice
2026-01-23 05:27:52,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 44 minutes, 14 seconds)
2026-01-23 05:31:26,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:31:39,143 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4260.57275 ± 1254.441
2026-01-23 05:31:39,143 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4884.847, 3175.2227, 4778.9355, 4808.584, 4842.82, 5024.1216, 4504.252, 4897.191, 819.4837, 4870.2637]
2026-01-23 05:31:39,143 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 692.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 253.0, 1000.0]
2026-01-23 05:31:39,151 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 42 minutes, 10 seconds)
2026-01-23 05:35:04,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:35:13,402 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 2659.41748 ± 2008.484
2026-01-23 05:35:13,402 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [601.5768, 159.74062, 242.26627, 31.48044, 4541.25, 3069.7434, 4611.8726, 4361.603, 4429.6426, 4544.9995]
2026-01-23 05:35:13,402 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [206.0, 111.0, 127.0, 48.0, 1000.0, 726.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:35:13,410 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 36 minutes, 26 seconds)
2026-01-23 05:38:44,630 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:38:58,023 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4481.32129 ± 777.239
2026-01-23 05:38:58,023 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4898.437, 5021.917, 4566.716, 3949.4104, 4938.575, 4830.095, 4768.333, 4630.8257, 2317.732, 4891.173]
2026-01-23 05:38:58,023 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 546.0, 1000.0]
2026-01-23 05:38:58,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 32 minutes, 29 seconds)
2026-01-23 05:42:32,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:42:45,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4415.14307 ± 1081.670
2026-01-23 05:42:45,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4738.9673, 4802.6665, 4839.082, 1172.8015, 4800.2344, 4817.671, 4770.281, 4749.1694, 4675.837, 4784.72]
2026-01-23 05:42:45,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 316.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:42:45,357 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 29 minutes, 9 seconds)
2026-01-23 05:46:33,415 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:46:47,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4852.43213 ± 82.005
2026-01-23 05:46:47,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4849.131, 4857.52, 5047.215, 4881.691, 4801.478, 4721.605, 4866.3203, 4762.951, 4869.8955, 4866.512]
2026-01-23 05:46:47,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:46:47,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (4852.43) for latency DatasetOffice
2026-01-23 05:46:47,525 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 27 minutes, 29 seconds)
2026-01-23 05:50:21,016 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:50:34,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4524.84180 ± 873.819
2026-01-23 05:50:34,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4769.268, 4996.991, 4797.0215, 4848.5327, 4794.881, 4812.6064, 4888.4736, 1921.1211, 4571.2544, 4848.2656]
2026-01-23 05:50:34,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 454.0, 1000.0, 1000.0]
2026-01-23 05:50:34,514 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 23 minutes, 48 seconds)
2026-01-23 05:53:51,730 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:54:04,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4347.90625 ± 1169.134
2026-01-23 05:54:04,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4787.901, 4874.8306, 4854.0537, 4865.257, 4864.0176, 3881.3594, 946.8326, 4759.843, 4855.494, 4789.4785]
2026-01-23 05:54:04,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 814.0, 299.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:54:04,726 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 19 minutes, 31 seconds)
2026-01-23 05:57:38,454 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 05:57:52,687 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4801.58301 ± 66.282
2026-01-23 05:57:52,687 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4783.646, 4709.9097, 4719.957, 4902.007, 4881.3438, 4795.3154, 4750.5703, 4751.8, 4861.392, 4859.893]
2026-01-23 05:57:52,687 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 05:57:52,695 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 16 minutes, 9 seconds)
2026-01-23 06:01:37,424 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:01:50,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4495.71777 ± 817.309
2026-01-23 06:01:50,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4837.7734, 4852.9243, 4702.7627, 4811.7114, 4441.909, 2105.467, 4842.2036, 4519.0703, 4706.117, 5137.2437]
2026-01-23 06:01:50,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 930.0, 491.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:01:50,978 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 13 minutes, 39 seconds)
2026-01-23 06:05:21,364 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:05:33,209 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3855.33081 ± 1591.873
2026-01-23 06:05:33,209 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [24.456026, 1449.8623, 4642.461, 4647.345, 4604.709, 4635.732, 4708.248, 4587.8486, 4690.966, 4561.678]
2026-01-23 06:05:33,209 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [45.0, 373.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:05:33,217 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 7 minutes, 34 seconds)
2026-01-23 06:09:07,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:09:21,311 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4700.80469 ± 24.090
2026-01-23 06:09:21,312 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4688.5913, 4682.253, 4708.2935, 4759.986, 4702.6084, 4690.64, 4694.455, 4696.021, 4719.4995, 4665.699]
2026-01-23 06:09:21,312 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:09:21,320 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 3 minutes, 56 seconds)
2026-01-23 06:12:46,199 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:12:58,992 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4261.89160 ± 1293.357
2026-01-23 06:12:58,992 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4784.058, 812.5242, 4595.7256, 4912.434, 5033.819, 5006.8413, 4875.557, 4835.3296, 4849.908, 2912.7183]
2026-01-23 06:12:58,992 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 248.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 639.0]
2026-01-23 06:12:59,001 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 59 seconds)
2026-01-23 06:16:38,078 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:16:48,643 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3391.94287 ± 1660.014
2026-01-23 06:16:48,643 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4739.305, 1443.7981, 4739.51, 4704.8438, 4740.6533, 1362.0991, 4855.795, 1370.6595, 4699.1323, 1263.6333]
2026-01-23 06:16:48,643 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 365.0, 1000.0, 1000.0, 1000.0, 363.0, 1000.0, 367.0, 1000.0, 350.0]
2026-01-23 06:16:48,652 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 57 minutes, 22 seconds)
2026-01-23 06:20:22,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:20:34,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3980.52222 ± 1157.225
2026-01-23 06:20:34,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4873.6167, 2774.506, 5024.8174, 4944.3037, 4832.416, 3743.3901, 2562.9983, 4429.389, 4926.66, 1693.1234]
2026-01-23 06:20:34,119 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 580.0, 1000.0, 1000.0, 1000.0, 792.0, 578.0, 901.0, 1000.0, 406.0]
2026-01-23 06:20:34,131 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 52 minutes, 18 seconds)
2026-01-23 06:24:03,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:24:15,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3642.22534 ± 1510.597
2026-01-23 06:24:15,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4217.726, 509.89407, 4939.3594, 4712.7437, 4828.6455, 4942.7466, 4821.906, 2994.56, 2910.6655, 1544.0084]
2026-01-23 06:24:15,035 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [865.0, 188.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 658.0, 644.0, 394.0]
2026-01-23 06:24:15,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 48 minutes, 26 seconds)
2026-01-23 06:27:57,687 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:28:11,260 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4402.42432 ± 649.844
2026-01-23 06:28:11,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4758.676, 4626.544, 4606.622, 4466.2188, 4631.7705, 4571.235, 4623.761, 4666.285, 4609.1606, 2463.9697]
2026-01-23 06:28:11,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 575.0]
2026-01-23 06:28:11,269 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 45 minutes, 27 seconds)
2026-01-23 06:31:32,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:31:45,887 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4257.39062 ± 1159.970
2026-01-23 06:31:45,887 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4543.6777, 4733.2026, 4547.991, 4762.2886, 4668.7246, 4644.919, 786.4095, 4598.997, 4531.802, 4755.8936]
2026-01-23 06:31:45,887 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 252.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:31:45,896 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 41 minutes, 25 seconds)
2026-01-23 06:35:21,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:35:35,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4616.52051 ± 73.935
2026-01-23 06:35:35,814 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4480.2803, 4746.986, 4631.106, 4670.492, 4592.3237, 4616.943, 4569.6553, 4546.5435, 4604.2183, 4706.6606]
2026-01-23 06:35:35,814 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:35:35,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 37 minutes, 41 seconds)
2026-01-23 06:39:09,376 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:39:23,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 5025.39307 ± 50.167
2026-01-23 06:39:23,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [5063.775, 5081.5073, 5052.177, 5033.496, 4987.6807, 5059.624, 5065.164, 4916.402, 5029.462, 4964.64]
2026-01-23 06:39:23,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:39:23,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (5025.39) for latency DatasetOffice
2026-01-23 06:39:23,654 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 34 minutes, 7 seconds)
2026-01-23 06:43:07,432 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:43:18,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3634.72510 ± 1636.426
2026-01-23 06:43:18,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4626.8003, 4678.221, 734.7899, 4733.4155, 2345.8943, 604.387, 4576.9854, 4810.1084, 4475.6772, 4760.9707]
2026-01-23 06:43:18,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 246.0, 1000.0, 551.0, 233.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:43:18,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 31 minutes, 30 seconds)
2026-01-23 06:46:38,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:46:52,049 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4924.62402 ± 91.960
2026-01-23 06:46:52,049 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [5068.2007, 4975.1, 5030.1377, 4856.8706, 4849.9443, 4808.24, 4937.933, 4840.8794, 5036.749, 4842.1855]
2026-01-23 06:46:52,049 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 966.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:46:52,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 25 minutes, 55 seconds)
2026-01-23 06:50:30,548 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:50:44,220 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4716.66650 ± 689.658
2026-01-23 06:50:44,220 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4939.768, 5037.665, 2660.4316, 5058.799, 4933.068, 5024.823, 4888.841, 4785.78, 4933.4624, 4904.027]
2026-01-23 06:50:44,220 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 608.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:50:44,231 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 23 minutes, 28 seconds)
2026-01-23 06:54:18,165 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:54:30,027 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3839.06177 ± 1675.917
2026-01-23 06:54:30,027 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [1013.7073, 24.109825, 4574.0547, 4652.6, 4726.339, 4601.1045, 4805.2856, 4625.0947, 4702.315, 4666.004]
2026-01-23 06:54:30,027 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [300.0, 35.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:54:30,036 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 19 minutes, 23 seconds)
2026-01-23 06:58:03,639 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 06:58:18,064 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4980.07520 ± 108.498
2026-01-23 06:58:18,064 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4955.827, 4936.7925, 4928.8735, 4791.06, 5117.5938, 4861.4854, 5145.464, 5062.5, 4933.697, 5067.459]
2026-01-23 06:58:18,064 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 06:58:18,076 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 15 minutes, 37 seconds)
2026-01-23 07:01:43,486 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:01:56,023 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4173.82764 ± 1401.037
2026-01-23 07:01:56,023 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4208.7793, 4804.792, 4843.538, 4687.961, 4816.291, 4740.8027, 4824.858, 4627.4795, 4151.509, 32.269077]
2026-01-23 07:01:56,024 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [873.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 854.0, 35.0]
2026-01-23 07:01:56,034 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 10 minutes, 45 seconds)
2026-01-23 07:05:49,320 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:06:03,550 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4739.32568 ± 59.967
2026-01-23 07:06:03,551 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4783.254, 4720.165, 4814.6406, 4784.5522, 4710.6357, 4601.1914, 4716.101, 4704.337, 4802.6333, 4755.7456]
2026-01-23 07:06:03,551 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:06:03,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 9 minutes, 5 seconds)
2026-01-23 07:09:36,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:09:49,463 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4464.99316 ± 961.194
2026-01-23 07:09:49,463 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4680.1445, 4766.7524, 4799.4937, 4794.6577, 4724.0425, 1585.4589, 4800.2456, 4877.98, 4824.321, 4796.8384]
2026-01-23 07:09:49,463 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 416.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:09:49,473 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 4 minutes, 53 seconds)
2026-01-23 07:13:10,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:13:22,202 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3790.69189 ± 1753.247
2026-01-23 07:13:22,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4763.3335, 4789.3867, 4850.6143, 3567.9583, 710.67554, 37.797997, 4782.441, 4820.8066, 4761.706, 4822.197]
2026-01-23 07:13:22,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 786.0, 257.0, 46.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:13:22,212 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 22 seconds)
2026-01-23 07:17:10,818 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:17:25,426 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4878.56738 ± 82.471
2026-01-23 07:17:25,426 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4892.4395, 4887.609, 4899.8706, 4950.5493, 4766.7837, 4928.9224, 4738.155, 4786.189, 4928.984, 5006.17]
2026-01-23 07:17:25,426 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:17:25,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 86/100 (estimated time remaining: 57 minutes, 22 seconds)
2026-01-23 07:20:57,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:21:11,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4894.10889 ± 84.887
2026-01-23 07:21:11,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4891.433, 4720.6265, 4932.6514, 4808.264, 4872.6655, 5013.9087, 4985.259, 4953.879, 4936.311, 4826.091]
2026-01-23 07:21:11,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:21:11,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 87/100 (estimated time remaining: 53 minutes, 56 seconds)
2026-01-23 07:24:45,643 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:25:00,071 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4827.99463 ± 98.414
2026-01-23 07:25:00,071 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4787.6113, 4705.227, 4761.957, 4750.1978, 4811.812, 4833.3813, 5005.9775, 4956.762, 4936.2847, 4730.7334]
2026-01-23 07:25:00,071 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:25:00,082 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 88/100 (estimated time remaining: 49 minutes, 14 seconds)
2026-01-23 07:28:36,233 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:28:49,714 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4704.98340 ± 936.716
2026-01-23 07:28:49,714 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [5045.0776, 4992.9834, 5065.1133, 5011.5493, 5004.0024, 5047.6396, 4995.5435, 5025.7637, 1896.1085, 4966.052]
2026-01-23 07:28:49,714 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 445.0, 1000.0]
2026-01-23 07:28:49,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 89/100 (estimated time remaining: 45 minutes, 36 seconds)
2026-01-23 07:32:35,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:32:45,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3316.25537 ± 2130.834
2026-01-23 07:32:45,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [5037.059, 5020.094, 5013.7266, 4956.1943, 5013.149, 2887.064, 0.7916066, 162.09396, 444.80136, 4627.5815]
2026-01-23 07:32:45,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 620.0, 11.0, 87.0, 195.0, 1000.0]
2026-01-23 07:32:45,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 90/100 (estimated time remaining: 42 minutes, 38 seconds)
2026-01-23 07:36:14,891 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:36:29,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 5088.26367 ± 78.975
2026-01-23 07:36:29,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [5052.4863, 5066.4966, 5076.368, 5168.379, 5121.9546, 4886.7783, 5098.5195, 5083.5454, 5145.2266, 5182.883]
2026-01-23 07:36:29,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:36:29,511 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1274 [INFO]: New best (5088.26) for latency DatasetOffice
2026-01-23 07:36:29,522 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 91/100 (estimated time remaining: 38 minutes, 8 seconds)
2026-01-23 07:40:05,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:40:20,074 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4866.12744 ± 354.901
2026-01-23 07:40:20,074 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [5042.0737, 4999.1406, 4907.946, 4989.4233, 5030.3164, 4993.043, 4957.432, 3807.1035, 4951.5254, 4983.272]
2026-01-23 07:40:20,074 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 779.0, 1000.0, 1000.0]
2026-01-23 07:40:20,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 92/100 (estimated time remaining: 34 minutes, 26 seconds)
2026-01-23 07:43:55,628 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:44:08,373 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4353.98096 ± 1177.229
2026-01-23 07:44:08,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [5087.4355, 3730.8306, 2963.312, 1513.4563, 4788.884, 5190.4517, 5075.4497, 5090.7217, 5119.9136, 4979.351]
2026-01-23 07:44:08,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 746.0, 627.0, 388.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:44:08,385 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 93/100 (estimated time remaining: 30 minutes, 37 seconds)
2026-01-23 07:47:44,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:47:58,593 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4649.54053 ± 901.314
2026-01-23 07:47:58,593 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4900.3604, 1948.7506, 4958.727, 4964.2847, 4894.165, 4886.3076, 4985.996, 4980.686, 5032.093, 4944.0347]
2026-01-23 07:47:58,593 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 489.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:47:58,606 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 94/100 (estimated time remaining: 26 minutes, 48 seconds)
2026-01-23 07:51:34,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:51:47,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4455.56934 ± 1191.396
2026-01-23 07:51:47,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [890.69775, 4980.4644, 4712.212, 4940.1743, 4856.0317, 4942.6206, 4820.181, 4765.9565, 4746.738, 4900.6157]
2026-01-23 07:51:47,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [259.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:51:47,435 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 95/100 (estimated time remaining: 22 minutes, 50 seconds)
2026-01-23 07:55:32,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:55:46,190 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4603.31934 ± 872.367
2026-01-23 07:55:46,190 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4712.3926, 4906.452, 4992.4644, 4880.837, 4909.7544, 1999.3721, 5045.474, 4808.846, 4851.2754, 4926.323]
2026-01-23 07:55:46,190 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 493.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 07:55:46,202 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 96/100 (estimated time remaining: 19 minutes, 16 seconds)
2026-01-23 07:58:59,963 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 07:59:13,061 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4348.81787 ± 1342.075
2026-01-23 07:59:13,061 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4879.2515, 4812.313, 4851.267, 4896.697, 4865.918, 4535.8447, 4649.4194, 4873.648, 336.0857, 4787.735]
2026-01-23 07:59:13,061 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 155.0, 1000.0]
2026-01-23 07:59:13,071 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 97/100 (estimated time remaining: 15 minutes, 6 seconds)
2026-01-23 08:02:51,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:03:00,759 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 3064.87720 ± 1949.560
2026-01-23 08:03:00,759 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [171.3039, 803.7492, 4745.932, 4740.2417, 4905.0293, 4973.7256, 4785.6665, 2651.0562, 2743.912, 128.15645]
2026-01-23 08:03:00,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [97.0, 246.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 589.0, 655.0, 128.0]
2026-01-23 08:03:00,773 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 98/100 (estimated time remaining: 11 minutes, 19 seconds)
2026-01-23 08:06:39,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:06:54,034 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4785.33496 ± 72.498
2026-01-23 08:06:54,034 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4809.0996, 4804.8486, 4628.107, 4806.761, 4777.722, 4722.359, 4878.5244, 4721.008, 4835.489, 4869.435]
2026-01-23 08:06:54,034 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 08:06:54,048 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 99/100 (estimated time remaining: 7 minutes, 34 seconds)
2026-01-23 08:10:37,585 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:10:49,575 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4013.37964 ± 1656.625
2026-01-23 08:10:49,575 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4818.1543, 4873.9126, 4851.802, 4884.8325, 4787.158, 808.85095, 4730.661, 4880.8975, 4900.527, 596.99945]
2026-01-23 08:10:49,575 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 246.0, 1000.0, 1000.0, 1000.0, 219.0]
2026-01-23 08:10:49,587 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1247 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 48 seconds)
2026-01-23 08:14:18,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 08:14:32,368 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1269 [DEBUG]: Total Reward: 4675.01709 ± 239.418
2026-01-23 08:14:32,368 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1270 [DEBUG]: All rewards: [4675.718, 4810.714, 4772.6533, 4773.209, 4839.235, 4677.498, 4681.071, 4800.4814, 3976.2976, 4743.296]
2026-01-23 08:14:32,368 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 854.0, 1000.0]
2026-01-23 08:14:32,380 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-walker2d):1299 [DEBUG]: Training session finished
