nohup: ignoring input
| distributed init (rank 0): env://
| distributed init (rank 1): env://
Namespace(ThreeAugment=False, aa='rand-m9-mstd0.5-inc1', attn_only=False, batch_size=128, bce_loss=False, clip_grad=None, color_jitter=0.3, cooldown_epochs=10, cosub=False, cutmix=1.0, cutmix_minmax=None, data_path='/local/storage/ding/cifar100', data_set='CIFAR', decay_epochs=30, decay_rate=0.1, device='cuda', dist_backend='nccl', dist_eval=False, dist_url='env://', distillation_alpha=0.5, distillation_tau=1.0, distillation_type='none', distributed=True, drop=0.0, drop_path=0.1, epochs=30, eval=False, eval_crop_ratio=0.875, finetune='/home/zhu.3723/kronvit/output/cifar100_kron30/deit_tiny_patch16_224_30/best_checkpoint.pth', gpu=0, inat_category='name', input_size=224, kron=True, kron_a_freeze=False, kron_b_freeze=True, kron_rank=50, lr=0.001, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, min_lr=0.0001, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='kron_deit_tiny_patch16_224', model_ema=True, model_ema_decay=0.99996, model_ema_force_cpu=False, momentum=0.9, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='/home/zhu.3723/kronvit/output/cifar100_kron30/deit_tiny_patch16_224_30/', patience_epochs=10, pin_mem=True, rank=0, recount=1, remode='pixel', repeated_aug=True, reprob=0.25, resplit=False, resume='', sched='cosine', seed=0, shape_bias=3, smoothing=0.1, src=False, start_epoch=0, teacher_model='regnety_160', teacher_path='', train_interpolation='bicubic', train_mode=True, unscale_lr=False, warmup_epochs=5, warmup_lr=0.001, weight_decay=0.05, world_size=2)
Files already downloaded and verified
Files already downloaded and verified
Creating model: kron_deit_tiny_patch16_224
kron_config: {'rank_rate': 6.0, 'structured_sparse': True, 'bias': False, 'shape_bias': 3, 'rank': 50}
torch.Size([50, 4, 48]) torch.Size([50, 48, 12])
torch.Size([50, 4, 48]) torch.Size([50, 48, 4])
torch.Size([50, 4, 96]) torch.Size([50, 48, 8])
torch.Size([50, 8, 48]) torch.Size([50, 96, 4])
torch.Size([50, 4, 48]) torch.Size([50, 48, 12])
torch.Size([50, 4, 48]) torch.Size([50, 48, 4])
torch.Size([50, 4, 96]) torch.Size([50, 48, 8])
torch.Size([50, 8, 48]) torch.Size([50, 96, 4])
torch.Size([50, 4, 48]) torch.Size([50, 48, 12])
torch.Size([50, 4, 48]) torch.Size([50, 48, 4])
torch.Size([50, 4, 96]) torch.Size([50, 48, 8])
torch.Size([50, 8, 48]) torch.Size([50, 96, 4])
torch.Size([50, 4, 48]) torch.Size([50, 48, 12])
torch.Size([50, 4, 48]) torch.Size([50, 48, 4])
torch.Size([50, 4, 96]) torch.Size([50, 48, 8])
torch.Size([50, 8, 48]) torch.Size([50, 96, 4])
torch.Size([50, 4, 48]) torch.Size([50, 48, 12])
torch.Size([50, 4, 48]) torch.Size([50, 48, 4])
torch.Size([50, 4, 96]) torch.Size([50, 48, 8])
torch.Size([50, 8, 48]) torch.Size([50, 96, 4])
torch.Size([50, 4, 48]) torch.Size([50, 48, 12])
torch.Size([50, 4, 48]) torch.Size([50, 48, 4])
torch.Size([50, 4, 96]) torch.Size([50, 48, 8])
torch.Size([50, 8, 48]) torch.Size([50, 96, 4])
torch.Size([50, 4, 48]) torch.Size([50, 48, 12])
torch.Size([50, 4, 48]) torch.Size([50, 48, 4])
torch.Size([50, 4, 96]) torch.Size([50, 48, 8])
torch.Size([50, 8, 48]) torch.Size([50, 96, 4])
torch.Size([50, 4, 48]) torch.Size([50, 48, 12])
torch.Size([50, 4, 48]) torch.Size([50, 48, 4])
torch.Size([50, 4, 96]) torch.Size([50, 48, 8])
torch.Size([50, 8, 48]) torch.Size([50, 96, 4])
torch.Size([50, 4, 48]) torch.Size([50, 48, 12])
torch.Size([50, 4, 48]) torch.Size([50, 48, 4])
torch.Size([50, 4, 96]) torch.Size([50, 48, 8])
torch.Size([50, 8, 48]) torch.Size([50, 96, 4])
torch.Size([50, 4, 48]) torch.Size([50, 48, 12])
torch.Size([50, 4, 48]) torch.Size([50, 48, 4])
torch.Size([50, 4, 96]) torch.Size([50, 48, 8])
torch.Size([50, 8, 48]) torch.Size([50, 96, 4])
torch.Size([50, 4, 48]) torch.Size([50, 48, 12])
torch.Size([50, 4, 48]) torch.Size([50, 48, 4])
torch.Size([50, 4, 96]) torch.Size([50, 48, 8])
torch.Size([50, 8, 48]) torch.Size([50, 96, 4])
torch.Size([50, 4, 48]) torch.Size([50, 48, 12])
torch.Size([50, 4, 48]) torch.Size([50, 48, 4])
torch.Size([50, 4, 96]) torch.Size([50, 48, 8])
torch.Size([50, 8, 48]) torch.Size([50, 96, 4])
torch.Size([50, 4, 50]) torch.Size([50, 48, 2])
Traceback (most recent call last):
  File "main.py", line 507, in <module>
    main(args)
  File "main.py", line 324, in main
    model.load_state_dict(checkpoint_model, strict=False)
  File "/home/zhu.3723/anaconda3/envs/savit2.0/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for VisionTransformer:
	size mismatch for blocks.0.attn.qkv.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.0.attn.qkv.b: copying a param with shape torch.Size([30, 48, 12]) from checkpoint, the shape in current model is torch.Size([50, 48, 12]).
	size mismatch for blocks.0.attn.proj.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.0.attn.proj.b: copying a param with shape torch.Size([30, 48, 4]) from checkpoint, the shape in current model is torch.Size([50, 48, 4]).
	size mismatch for blocks.0.mlp.fc1.a: copying a param with shape torch.Size([30, 4, 96]) from checkpoint, the shape in current model is torch.Size([50, 4, 96]).
	size mismatch for blocks.0.mlp.fc1.b: copying a param with shape torch.Size([30, 48, 8]) from checkpoint, the shape in current model is torch.Size([50, 48, 8]).
	size mismatch for blocks.0.mlp.fc2.a: copying a param with shape torch.Size([30, 8, 48]) from checkpoint, the shape in current model is torch.Size([50, 8, 48]).
	size mismatch for blocks.0.mlp.fc2.b: copying a param with shape torch.Size([30, 96, 4]) from checkpoint, the shape in current model is torch.Size([50, 96, 4]).
	size mismatch for blocks.1.attn.qkv.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.1.attn.qkv.b: copying a param with shape torch.Size([30, 48, 12]) from checkpoint, the shape in current model is torch.Size([50, 48, 12]).
	size mismatch for blocks.1.attn.proj.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.1.attn.proj.b: copying a param with shape torch.Size([30, 48, 4]) from checkpoint, the shape in current model is torch.Size([50, 48, 4]).
	size mismatch for blocks.1.mlp.fc1.a: copying a param with shape torch.Size([30, 4, 96]) from checkpoint, the shape in current model is torch.Size([50, 4, 96]).
	size mismatch for blocks.1.mlp.fc1.b: copying a param with shape torch.Size([30, 48, 8]) from checkpoint, the shape in current model is torch.Size([50, 48, 8]).
	size mismatch for blocks.1.mlp.fc2.a: copying a param with shape torch.Size([30, 8, 48]) from checkpoint, the shape in current model is torch.Size([50, 8, 48]).
	size mismatch for blocks.1.mlp.fc2.b: copying a param with shape torch.Size([30, 96, 4]) from checkpoint, the shape in current model is torch.Size([50, 96, 4]).
	size mismatch for blocks.2.attn.qkv.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.2.attn.qkv.b: copying a param with shape torch.Size([30, 48, 12]) from checkpoint, the shape in current model is torch.Size([50, 48, 12]).
	size mismatch for blocks.2.attn.proj.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.2.attn.proj.b: copying a param with shape torch.Size([30, 48, 4]) from checkpoint, the shape in current model is torch.Size([50, 48, 4]).
	size mismatch for blocks.2.mlp.fc1.a: copying a param with shape torch.Size([30, 4, 96]) from checkpoint, the shape in current model is torch.Size([50, 4, 96]).
	size mismatch for blocks.2.mlp.fc1.b: copying a param with shape torch.Size([30, 48, 8]) from checkpoint, the shape in current model is torch.Size([50, 48, 8]).
	size mismatch for blocks.2.mlp.fc2.a: copying a param with shape torch.Size([30, 8, 48]) from checkpoint, the shape in current model is torch.Size([50, 8, 48]).
	size mismatch for blocks.2.mlp.fc2.b: copying a param with shape torch.Size([30, 96, 4]) from checkpoint, the shape in current model is torch.Size([50, 96, 4]).
	size mismatch for blocks.3.attn.qkv.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.3.attn.qkv.b: copying a param with shape torch.Size([30, 48, 12]) from checkpoint, the shape in current model is torch.Size([50, 48, 12]).
	size mismatch for blocks.3.attn.proj.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.3.attn.proj.b: copying a param with shape torch.Size([30, 48, 4]) from checkpoint, the shape in current model is torch.Size([50, 48, 4]).
	size mismatch for blocks.3.mlp.fc1.a: copying a param with shape torch.Size([30, 4, 96]) from checkpoint, the shape in current model is torch.Size([50, 4, 96]).
	size mismatch for blocks.3.mlp.fc1.b: copying a param with shape torch.Size([30, 48, 8]) from checkpoint, the shape in current model is torch.Size([50, 48, 8]).
	size mismatch for blocks.3.mlp.fc2.a: copying a param with shape torch.Size([30, 8, 48]) from checkpoint, the shape in current model is torch.Size([50, 8, 48]).
	size mismatch for blocks.3.mlp.fc2.b: copying a param with shape torch.Size([30, 96, 4]) from checkpoint, the shape in current model is torch.Size([50, 96, 4]).
	size mismatch for blocks.4.attn.qkv.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.4.attn.qkv.b: copying a param with shape torch.Size([30, 48, 12]) from checkpoint, the shape in current model is torch.Size([50, 48, 12]).
	size mismatch for blocks.4.attn.proj.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.4.attn.proj.b: copying a param with shape torch.Size([30, 48, 4]) from checkpoint, the shape in current model is torch.Size([50, 48, 4]).
	size mismatch for blocks.4.mlp.fc1.a: copying a param with shape torch.Size([30, 4, 96]) from checkpoint, the shape in current model is torch.Size([50, 4, 96]).
	size mismatch for blocks.4.mlp.fc1.b: copying a param with shape torch.Size([30, 48, 8]) from checkpoint, the shape in current model is torch.Size([50, 48, 8]).
	size mismatch for blocks.4.mlp.fc2.a: copying a param with shape torch.Size([30, 8, 48]) from checkpoint, the shape in current model is torch.Size([50, 8, 48]).
	size mismatch for blocks.4.mlp.fc2.b: copying a param with shape torch.Size([30, 96, 4]) from checkpoint, the shape in current model is torch.Size([50, 96, 4]).
	size mismatch for blocks.5.attn.qkv.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.5.attn.qkv.b: copying a param with shape torch.Size([30, 48, 12]) from checkpoint, the shape in current model is torch.Size([50, 48, 12]).
	size mismatch for blocks.5.attn.proj.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.5.attn.proj.b: copying a param with shape torch.Size([30, 48, 4]) from checkpoint, the shape in current model is torch.Size([50, 48, 4]).
	size mismatch for blocks.5.mlp.fc1.a: copying a param with shape torch.Size([30, 4, 96]) from checkpoint, the shape in current model is torch.Size([50, 4, 96]).
	size mismatch for blocks.5.mlp.fc1.b: copying a param with shape torch.Size([30, 48, 8]) from checkpoint, the shape in current model is torch.Size([50, 48, 8]).
	size mismatch for blocks.5.mlp.fc2.a: copying a param with shape torch.Size([30, 8, 48]) from checkpoint, the shape in current model is torch.Size([50, 8, 48]).
	size mismatch for blocks.5.mlp.fc2.b: copying a param with shape torch.Size([30, 96, 4]) from checkpoint, the shape in current model is torch.Size([50, 96, 4]).
	size mismatch for blocks.6.attn.qkv.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.6.attn.qkv.b: copying a param with shape torch.Size([30, 48, 12]) from checkpoint, the shape in current model is torch.Size([50, 48, 12]).
	size mismatch for blocks.6.attn.proj.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.6.attn.proj.b: copying a param with shape torch.Size([30, 48, 4]) from checkpoint, the shape in current model is torch.Size([50, 48, 4]).
	size mismatch for blocks.6.mlp.fc1.a: copying a param with shape torch.Size([30, 4, 96]) from checkpoint, the shape in current model is torch.Size([50, 4, 96]).
	size mismatch for blocks.6.mlp.fc1.b: copying a param with shape torch.Size([30, 48, 8]) from checkpoint, the shape in current model is torch.Size([50, 48, 8]).
	size mismatch for blocks.6.mlp.fc2.a: copying a param with shape torch.Size([30, 8, 48]) from checkpoint, the shape in current model is torch.Size([50, 8, 48]).
	size mismatch for blocks.6.mlp.fc2.b: copying a param with shape torch.Size([30, 96, 4]) from checkpoint, the shape in current model is torch.Size([50, 96, 4]).
	size mismatch for blocks.7.attn.qkv.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.7.attn.qkv.b: copying a param with shape torch.Size([30, 48, 12]) from checkpoint, the shape in current model is torch.Size([50, 48, 12]).
	size mismatch for blocks.7.attn.proj.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.7.attn.proj.b: copying a param with shape torch.Size([30, 48, 4]) from checkpoint, the shape in current model is torch.Size([50, 48, 4]).
	size mismatch for blocks.7.mlp.fc1.a: copying a param with shape torch.Size([30, 4, 96]) from checkpoint, the shape in current model is torch.Size([50, 4, 96]).
	size mismatch for blocks.7.mlp.fc1.b: copying a param with shape torch.Size([30, 48, 8]) from checkpoint, the shape in current model is torch.Size([50, 48, 8]).
	size mismatch for blocks.7.mlp.fc2.a: copying a param with shape torch.Size([30, 8, 48]) from checkpoint, the shape in current model is torch.Size([50, 8, 48]).
	size mismatch for blocks.7.mlp.fc2.b: copying a param with shape torch.Size([30, 96, 4]) from checkpoint, the shape in current model is torch.Size([50, 96, 4]).
	size mismatch for blocks.8.attn.qkv.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.8.attn.qkv.b: copying a param with shape torch.Size([30, 48, 12]) from checkpoint, the shape in current model is torch.Size([50, 48, 12]).
	size mismatch for blocks.8.attn.proj.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.8.attn.proj.b: copying a param with shape torch.Size([30, 48, 4]) from checkpoint, the shape in current model is torch.Size([50, 48, 4]).
	size mismatch for blocks.8.mlp.fc1.a: copying a param with shape torch.Size([30, 4, 96]) from checkpoint, the shape in current model is torch.Size([50, 4, 96]).
	size mismatch for blocks.8.mlp.fc1.b: copying a param with shape torch.Size([30, 48, 8]) from checkpoint, the shape in current model is torch.Size([50, 48, 8]).
	size mismatch for blocks.8.mlp.fc2.a: copying a param with shape torch.Size([30, 8, 48]) from checkpoint, the shape in current model is torch.Size([50, 8, 48]).
	size mismatch for blocks.8.mlp.fc2.b: copying a param with shape torch.Size([30, 96, 4]) from checkpoint, the shape in current model is torch.Size([50, 96, 4]).
	size mismatch for blocks.9.attn.qkv.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.9.attn.qkv.b: copying a param with shape torch.Size([30, 48, 12]) from checkpoint, the shape in current model is torch.Size([50, 48, 12]).
	size mismatch for blocks.9.attn.proj.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.9.attn.proj.b: copying a param with shape torch.Size([30, 48, 4]) from checkpoint, the shape in current model is torch.Size([50, 48, 4]).
	size mismatch for blocks.9.mlp.fc1.a: copying a param with shape torch.Size([30, 4, 96]) from checkpoint, the shape in current model is torch.Size([50, 4, 96]).
	size mismatch for blocks.9.mlp.fc1.b: copying a param with shape torch.Size([30, 48, 8]) from checkpoint, the shape in current model is torch.Size([50, 48, 8]).
	size mismatch for blocks.9.mlp.fc2.a: copying a param with shape torch.Size([30, 8, 48]) from checkpoint, the shape in current model is torch.Size([50, 8, 48]).
	size mismatch for blocks.9.mlp.fc2.b: copying a param with shape torch.Size([30, 96, 4]) from checkpoint, the shape in current model is torch.Size([50, 96, 4]).
	size mismatch for blocks.10.attn.qkv.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.10.attn.qkv.b: copying a param with shape torch.Size([30, 48, 12]) from checkpoint, the shape in current model is torch.Size([50, 48, 12]).
	size mismatch for blocks.10.attn.proj.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.10.attn.proj.b: copying a param with shape torch.Size([30, 48, 4]) from checkpoint, the shape in current model is torch.Size([50, 48, 4]).
	size mismatch for blocks.10.mlp.fc1.a: copying a param with shape torch.Size([30, 4, 96]) from checkpoint, the shape in current model is torch.Size([50, 4, 96]).
	size mismatch for blocks.10.mlp.fc1.b: copying a param with shape torch.Size([30, 48, 8]) from checkpoint, the shape in current model is torch.Size([50, 48, 8]).
	size mismatch for blocks.10.mlp.fc2.a: copying a param with shape torch.Size([30, 8, 48]) from checkpoint, the shape in current model is torch.Size([50, 8, 48]).
	size mismatch for blocks.10.mlp.fc2.b: copying a param with shape torch.Size([30, 96, 4]) from checkpoint, the shape in current model is torch.Size([50, 96, 4]).
	size mismatch for blocks.11.attn.qkv.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.11.attn.qkv.b: copying a param with shape torch.Size([30, 48, 12]) from checkpoint, the shape in current model is torch.Size([50, 48, 12]).
	size mismatch for blocks.11.attn.proj.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.11.attn.proj.b: copying a param with shape torch.Size([30, 48, 4]) from checkpoint, the shape in current model is torch.Size([50, 48, 4]).
	size mismatch for blocks.11.mlp.fc1.a: copying a param with shape torch.Size([30, 4, 96]) from checkpoint, the shape in current model is torch.Size([50, 4, 96]).
	size mismatch for blocks.11.mlp.fc1.b: copying a param with shape torch.Size([30, 48, 8]) from checkpoint, the shape in current model is torch.Size([50, 48, 8]).
	size mismatch for blocks.11.mlp.fc2.a: copying a param with shape torch.Size([30, 8, 48]) from checkpoint, the shape in current model is torch.Size([50, 8, 48]).
	size mismatch for blocks.11.mlp.fc2.b: copying a param with shape torch.Size([30, 96, 4]) from checkpoint, the shape in current model is torch.Size([50, 96, 4]).
	size mismatch for head.a: copying a param with shape torch.Size([30, 4, 50]) from checkpoint, the shape in current model is torch.Size([50, 4, 50]).
	size mismatch for head.b: copying a param with shape torch.Size([30, 48, 2]) from checkpoint, the shape in current model is torch.Size([50, 48, 2]).
Traceback (most recent call last):
  File "main.py", line 507, in <module>
    main(args)
  File "main.py", line 324, in main
    model.load_state_dict(checkpoint_model, strict=False)
  File "/home/zhu.3723/anaconda3/envs/savit2.0/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for VisionTransformer:
	size mismatch for blocks.0.attn.qkv.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.0.attn.qkv.b: copying a param with shape torch.Size([30, 48, 12]) from checkpoint, the shape in current model is torch.Size([50, 48, 12]).
	size mismatch for blocks.0.attn.proj.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.0.attn.proj.b: copying a param with shape torch.Size([30, 48, 4]) from checkpoint, the shape in current model is torch.Size([50, 48, 4]).
	size mismatch for blocks.0.mlp.fc1.a: copying a param with shape torch.Size([30, 4, 96]) from checkpoint, the shape in current model is torch.Size([50, 4, 96]).
	size mismatch for blocks.0.mlp.fc1.b: copying a param with shape torch.Size([30, 48, 8]) from checkpoint, the shape in current model is torch.Size([50, 48, 8]).
	size mismatch for blocks.0.mlp.fc2.a: copying a param with shape torch.Size([30, 8, 48]) from checkpoint, the shape in current model is torch.Size([50, 8, 48]).
	size mismatch for blocks.0.mlp.fc2.b: copying a param with shape torch.Size([30, 96, 4]) from checkpoint, the shape in current model is torch.Size([50, 96, 4]).
	size mismatch for blocks.1.attn.qkv.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.1.attn.qkv.b: copying a param with shape torch.Size([30, 48, 12]) from checkpoint, the shape in current model is torch.Size([50, 48, 12]).
	size mismatch for blocks.1.attn.proj.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.1.attn.proj.b: copying a param with shape torch.Size([30, 48, 4]) from checkpoint, the shape in current model is torch.Size([50, 48, 4]).
	size mismatch for blocks.1.mlp.fc1.a: copying a param with shape torch.Size([30, 4, 96]) from checkpoint, the shape in current model is torch.Size([50, 4, 96]).
	size mismatch for blocks.1.mlp.fc1.b: copying a param with shape torch.Size([30, 48, 8]) from checkpoint, the shape in current model is torch.Size([50, 48, 8]).
	size mismatch for blocks.1.mlp.fc2.a: copying a param with shape torch.Size([30, 8, 48]) from checkpoint, the shape in current model is torch.Size([50, 8, 48]).
	size mismatch for blocks.1.mlp.fc2.b: copying a param with shape torch.Size([30, 96, 4]) from checkpoint, the shape in current model is torch.Size([50, 96, 4]).
	size mismatch for blocks.2.attn.qkv.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.2.attn.qkv.b: copying a param with shape torch.Size([30, 48, 12]) from checkpoint, the shape in current model is torch.Size([50, 48, 12]).
	size mismatch for blocks.2.attn.proj.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.2.attn.proj.b: copying a param with shape torch.Size([30, 48, 4]) from checkpoint, the shape in current model is torch.Size([50, 48, 4]).
	size mismatch for blocks.2.mlp.fc1.a: copying a param with shape torch.Size([30, 4, 96]) from checkpoint, the shape in current model is torch.Size([50, 4, 96]).
	size mismatch for blocks.2.mlp.fc1.b: copying a param with shape torch.Size([30, 48, 8]) from checkpoint, the shape in current model is torch.Size([50, 48, 8]).
	size mismatch for blocks.2.mlp.fc2.a: copying a param with shape torch.Size([30, 8, 48]) from checkpoint, the shape in current model is torch.Size([50, 8, 48]).
	size mismatch for blocks.2.mlp.fc2.b: copying a param with shape torch.Size([30, 96, 4]) from checkpoint, the shape in current model is torch.Size([50, 96, 4]).
	size mismatch for blocks.3.attn.qkv.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.3.attn.qkv.b: copying a param with shape torch.Size([30, 48, 12]) from checkpoint, the shape in current model is torch.Size([50, 48, 12]).
	size mismatch for blocks.3.attn.proj.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.3.attn.proj.b: copying a param with shape torch.Size([30, 48, 4]) from checkpoint, the shape in current model is torch.Size([50, 48, 4]).
	size mismatch for blocks.3.mlp.fc1.a: copying a param with shape torch.Size([30, 4, 96]) from checkpoint, the shape in current model is torch.Size([50, 4, 96]).
	size mismatch for blocks.3.mlp.fc1.b: copying a param with shape torch.Size([30, 48, 8]) from checkpoint, the shape in current model is torch.Size([50, 48, 8]).
	size mismatch for blocks.3.mlp.fc2.a: copying a param with shape torch.Size([30, 8, 48]) from checkpoint, the shape in current model is torch.Size([50, 8, 48]).
	size mismatch for blocks.3.mlp.fc2.b: copying a param with shape torch.Size([30, 96, 4]) from checkpoint, the shape in current model is torch.Size([50, 96, 4]).
	size mismatch for blocks.4.attn.qkv.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.4.attn.qkv.b: copying a param with shape torch.Size([30, 48, 12]) from checkpoint, the shape in current model is torch.Size([50, 48, 12]).
	size mismatch for blocks.4.attn.proj.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.4.attn.proj.b: copying a param with shape torch.Size([30, 48, 4]) from checkpoint, the shape in current model is torch.Size([50, 48, 4]).
	size mismatch for blocks.4.mlp.fc1.a: copying a param with shape torch.Size([30, 4, 96]) from checkpoint, the shape in current model is torch.Size([50, 4, 96]).
	size mismatch for blocks.4.mlp.fc1.b: copying a param with shape torch.Size([30, 48, 8]) from checkpoint, the shape in current model is torch.Size([50, 48, 8]).
	size mismatch for blocks.4.mlp.fc2.a: copying a param with shape torch.Size([30, 8, 48]) from checkpoint, the shape in current model is torch.Size([50, 8, 48]).
	size mismatch for blocks.4.mlp.fc2.b: copying a param with shape torch.Size([30, 96, 4]) from checkpoint, the shape in current model is torch.Size([50, 96, 4]).
	size mismatch for blocks.5.attn.qkv.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.5.attn.qkv.b: copying a param with shape torch.Size([30, 48, 12]) from checkpoint, the shape in current model is torch.Size([50, 48, 12]).
	size mismatch for blocks.5.attn.proj.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.5.attn.proj.b: copying a param with shape torch.Size([30, 48, 4]) from checkpoint, the shape in current model is torch.Size([50, 48, 4]).
	size mismatch for blocks.5.mlp.fc1.a: copying a param with shape torch.Size([30, 4, 96]) from checkpoint, the shape in current model is torch.Size([50, 4, 96]).
	size mismatch for blocks.5.mlp.fc1.b: copying a param with shape torch.Size([30, 48, 8]) from checkpoint, the shape in current model is torch.Size([50, 48, 8]).
	size mismatch for blocks.5.mlp.fc2.a: copying a param with shape torch.Size([30, 8, 48]) from checkpoint, the shape in current model is torch.Size([50, 8, 48]).
	size mismatch for blocks.5.mlp.fc2.b: copying a param with shape torch.Size([30, 96, 4]) from checkpoint, the shape in current model is torch.Size([50, 96, 4]).
	size mismatch for blocks.6.attn.qkv.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.6.attn.qkv.b: copying a param with shape torch.Size([30, 48, 12]) from checkpoint, the shape in current model is torch.Size([50, 48, 12]).
	size mismatch for blocks.6.attn.proj.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.6.attn.proj.b: copying a param with shape torch.Size([30, 48, 4]) from checkpoint, the shape in current model is torch.Size([50, 48, 4]).
	size mismatch for blocks.6.mlp.fc1.a: copying a param with shape torch.Size([30, 4, 96]) from checkpoint, the shape in current model is torch.Size([50, 4, 96]).
	size mismatch for blocks.6.mlp.fc1.b: copying a param with shape torch.Size([30, 48, 8]) from checkpoint, the shape in current model is torch.Size([50, 48, 8]).
	size mismatch for blocks.6.mlp.fc2.a: copying a param with shape torch.Size([30, 8, 48]) from checkpoint, the shape in current model is torch.Size([50, 8, 48]).
	size mismatch for blocks.6.mlp.fc2.b: copying a param with shape torch.Size([30, 96, 4]) from checkpoint, the shape in current model is torch.Size([50, 96, 4]).
	size mismatch for blocks.7.attn.qkv.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.7.attn.qkv.b: copying a param with shape torch.Size([30, 48, 12]) from checkpoint, the shape in current model is torch.Size([50, 48, 12]).
	size mismatch for blocks.7.attn.proj.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.7.attn.proj.b: copying a param with shape torch.Size([30, 48, 4]) from checkpoint, the shape in current model is torch.Size([50, 48, 4]).
	size mismatch for blocks.7.mlp.fc1.a: copying a param with shape torch.Size([30, 4, 96]) from checkpoint, the shape in current model is torch.Size([50, 4, 96]).
	size mismatch for blocks.7.mlp.fc1.b: copying a param with shape torch.Size([30, 48, 8]) from checkpoint, the shape in current model is torch.Size([50, 48, 8]).
	size mismatch for blocks.7.mlp.fc2.a: copying a param with shape torch.Size([30, 8, 48]) from checkpoint, the shape in current model is torch.Size([50, 8, 48]).
	size mismatch for blocks.7.mlp.fc2.b: copying a param with shape torch.Size([30, 96, 4]) from checkpoint, the shape in current model is torch.Size([50, 96, 4]).
	size mismatch for blocks.8.attn.qkv.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.8.attn.qkv.b: copying a param with shape torch.Size([30, 48, 12]) from checkpoint, the shape in current model is torch.Size([50, 48, 12]).
	size mismatch for blocks.8.attn.proj.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.8.attn.proj.b: copying a param with shape torch.Size([30, 48, 4]) from checkpoint, the shape in current model is torch.Size([50, 48, 4]).
	size mismatch for blocks.8.mlp.fc1.a: copying a param with shape torch.Size([30, 4, 96]) from checkpoint, the shape in current model is torch.Size([50, 4, 96]).
	size mismatch for blocks.8.mlp.fc1.b: copying a param with shape torch.Size([30, 48, 8]) from checkpoint, the shape in current model is torch.Size([50, 48, 8]).
	size mismatch for blocks.8.mlp.fc2.a: copying a param with shape torch.Size([30, 8, 48]) from checkpoint, the shape in current model is torch.Size([50, 8, 48]).
	size mismatch for blocks.8.mlp.fc2.b: copying a param with shape torch.Size([30, 96, 4]) from checkpoint, the shape in current model is torch.Size([50, 96, 4]).
	size mismatch for blocks.9.attn.qkv.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.9.attn.qkv.b: copying a param with shape torch.Size([30, 48, 12]) from checkpoint, the shape in current model is torch.Size([50, 48, 12]).
	size mismatch for blocks.9.attn.proj.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.9.attn.proj.b: copying a param with shape torch.Size([30, 48, 4]) from checkpoint, the shape in current model is torch.Size([50, 48, 4]).
	size mismatch for blocks.9.mlp.fc1.a: copying a param with shape torch.Size([30, 4, 96]) from checkpoint, the shape in current model is torch.Size([50, 4, 96]).
	size mismatch for blocks.9.mlp.fc1.b: copying a param with shape torch.Size([30, 48, 8]) from checkpoint, the shape in current model is torch.Size([50, 48, 8]).
	size mismatch for blocks.9.mlp.fc2.a: copying a param with shape torch.Size([30, 8, 48]) from checkpoint, the shape in current model is torch.Size([50, 8, 48]).
	size mismatch for blocks.9.mlp.fc2.b: copying a param with shape torch.Size([30, 96, 4]) from checkpoint, the shape in current model is torch.Size([50, 96, 4]).
	size mismatch for blocks.10.attn.qkv.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.10.attn.qkv.b: copying a param with shape torch.Size([30, 48, 12]) from checkpoint, the shape in current model is torch.Size([50, 48, 12]).
	size mismatch for blocks.10.attn.proj.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.10.attn.proj.b: copying a param with shape torch.Size([30, 48, 4]) from checkpoint, the shape in current model is torch.Size([50, 48, 4]).
	size mismatch for blocks.10.mlp.fc1.a: copying a param with shape torch.Size([30, 4, 96]) from checkpoint, the shape in current model is torch.Size([50, 4, 96]).
	size mismatch for blocks.10.mlp.fc1.b: copying a param with shape torch.Size([30, 48, 8]) from checkpoint, the shape in current model is torch.Size([50, 48, 8]).
	size mismatch for blocks.10.mlp.fc2.a: copying a param with shape torch.Size([30, 8, 48]) from checkpoint, the shape in current model is torch.Size([50, 8, 48]).
	size mismatch for blocks.10.mlp.fc2.b: copying a param with shape torch.Size([30, 96, 4]) from checkpoint, the shape in current model is torch.Size([50, 96, 4]).
	size mismatch for blocks.11.attn.qkv.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.11.attn.qkv.b: copying a param with shape torch.Size([30, 48, 12]) from checkpoint, the shape in current model is torch.Size([50, 48, 12]).
	size mismatch for blocks.11.attn.proj.a: copying a param with shape torch.Size([30, 4, 48]) from checkpoint, the shape in current model is torch.Size([50, 4, 48]).
	size mismatch for blocks.11.attn.proj.b: copying a param with shape torch.Size([30, 48, 4]) from checkpoint, the shape in current model is torch.Size([50, 48, 4]).
	size mismatch for blocks.11.mlp.fc1.a: copying a param with shape torch.Size([30, 4, 96]) from checkpoint, the shape in current model is torch.Size([50, 4, 96]).
	size mismatch for blocks.11.mlp.fc1.b: copying a param with shape torch.Size([30, 48, 8]) from checkpoint, the shape in current model is torch.Size([50, 48, 8]).
	size mismatch for blocks.11.mlp.fc2.a: copying a param with shape torch.Size([30, 8, 48]) from checkpoint, the shape in current model is torch.Size([50, 8, 48]).
	size mismatch for blocks.11.mlp.fc2.b: copying a param with shape torch.Size([30, 96, 4]) from checkpoint, the shape in current model is torch.Size([50, 96, 4]).
	size mismatch for head.a: copying a param with shape torch.Size([30, 4, 50]) from checkpoint, the shape in current model is torch.Size([50, 4, 50]).
	size mismatch for head.b: copying a param with shape torch.Size([30, 48, 2]) from checkpoint, the shape in current model is torch.Size([50, 48, 2]).
[2024-03-14 11:31:09,597] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 2735529) of binary: /home/zhu.3723/anaconda3/envs/savit2.0/bin/python
Traceback (most recent call last):
  File "/home/zhu.3723/anaconda3/envs/savit2.0/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/zhu.3723/anaconda3/envs/savit2.0/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/zhu.3723/anaconda3/envs/savit2.0/lib/python3.8/site-packages/torch/distributed/run.py", line 806, in main
    run(args)
  File "/home/zhu.3723/anaconda3/envs/savit2.0/lib/python3.8/site-packages/torch/distributed/run.py", line 797, in run
    elastic_launch(
  File "/home/zhu.3723/anaconda3/envs/savit2.0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/zhu.3723/anaconda3/envs/savit2.0/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
main.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2024-03-14_11:31:09
  host      : cse-cnc197017s.coeit.osu.edu
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 2735530)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-03-14_11:31:09
  host      : cse-cnc197017s.coeit.osu.edu
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 2735529)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
| distributed init (rank 0): env://
| distributed init (rank 1): env://
Namespace(ThreeAugment=False, aa='rand-m9-mstd0.5-inc1', attn_only=False, batch_size=128, bce_loss=False, clip_grad=None, color_jitter=0.3, cooldown_epochs=10, cosub=False, cutmix=1.0, cutmix_minmax=None, data_path='/local/storage/ding/cifar100', data_set='CIFAR', decay_epochs=30, decay_rate=0.1, device='cuda', dist_backend='nccl', dist_eval=False, dist_url='env://', distillation_alpha=0.5, distillation_tau=1.0, distillation_type='none', distributed=True, drop=0.0, drop_path=0.1, epochs=50, eval=False, eval_crop_ratio=0.875, finetune='/home/zhu.3723/kronvit/output/cifar100_kron30/deit_tiny_patch16_224_30/best_checkpoint.pth', gpu=0, inat_category='name', input_size=224, kron=True, kron_a_freeze=True, kron_b_freeze=False, kron_rank=30, lr=0.001, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, min_lr=0.0001, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='kron_deit_tiny_patch16_224', model_ema=True, model_ema_decay=0.99996, model_ema_force_cpu=False, momentum=0.9, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='/home/zhu.3723/kronvit/output/cifar100_kron30/deit_tiny_patch16_224_30/', patience_epochs=10, pin_mem=True, rank=0, recount=1, remode='pixel', repeated_aug=True, reprob=0.25, resplit=False, resume='', sched='cosine', seed=0, shape_bias=3, smoothing=0.1, src=False, start_epoch=0, teacher_model='regnety_160', teacher_path='', train_interpolation='bicubic', train_mode=True, unscale_lr=False, warmup_epochs=5, warmup_lr=0.001, weight_decay=0.05, world_size=2)
Files already downloaded and verified
Files already downloaded and verified
Creating model: kron_deit_tiny_patch16_224
kron_config: {'rank_rate': 6.0, 'structured_sparse': True, 'bias': False, 'shape_bias': 3, 'rank': 30}
torch.Size([30, 4, 48]) torch.Size([30, 48, 12])
torch.Size([30, 4, 48]) torch.Size([30, 48, 4])
torch.Size([30, 4, 96]) torch.Size([30, 48, 8])
torch.Size([30, 8, 48]) torch.Size([30, 96, 4])
torch.Size([30, 4, 48]) torch.Size([30, 48, 12])
torch.Size([30, 4, 48]) torch.Size([30, 48, 4])
torch.Size([30, 4, 96]) torch.Size([30, 48, 8])
torch.Size([30, 8, 48]) torch.Size([30, 96, 4])
torch.Size([30, 4, 48]) torch.Size([30, 48, 12])
torch.Size([30, 4, 48]) torch.Size([30, 48, 4])
torch.Size([30, 4, 96]) torch.Size([30, 48, 8])
torch.Size([30, 8, 48]) torch.Size([30, 96, 4])
torch.Size([30, 4, 48]) torch.Size([30, 48, 12])
torch.Size([30, 4, 48]) torch.Size([30, 48, 4])
torch.Size([30, 4, 96]) torch.Size([30, 48, 8])
torch.Size([30, 8, 48]) torch.Size([30, 96, 4])
torch.Size([30, 4, 48]) torch.Size([30, 48, 12])
torch.Size([30, 4, 48]) torch.Size([30, 48, 4])
torch.Size([30, 4, 96]) torch.Size([30, 48, 8])
torch.Size([30, 8, 48]) torch.Size([30, 96, 4])
torch.Size([30, 4, 48]) torch.Size([30, 48, 12])
torch.Size([30, 4, 48]) torch.Size([30, 48, 4])
torch.Size([30, 4, 96]) torch.Size([30, 48, 8])
torch.Size([30, 8, 48]) torch.Size([30, 96, 4])
torch.Size([30, 4, 48]) torch.Size([30, 48, 12])
torch.Size([30, 4, 48]) torch.Size([30, 48, 4])
torch.Size([30, 4, 96]) torch.Size([30, 48, 8])
torch.Size([30, 8, 48]) torch.Size([30, 96, 4])
torch.Size([30, 4, 48]) torch.Size([30, 48, 12])
torch.Size([30, 4, 48]) torch.Size([30, 48, 4])
torch.Size([30, 4, 96]) torch.Size([30, 48, 8])
torch.Size([30, 8, 48]) torch.Size([30, 96, 4])
torch.Size([30, 4, 48]) torch.Size([30, 48, 12])
torch.Size([30, 4, 48]) torch.Size([30, 48, 4])
torch.Size([30, 4, 96]) torch.Size([30, 48, 8])
torch.Size([30, 8, 48]) torch.Size([30, 96, 4])
torch.Size([30, 4, 48]) torch.Size([30, 48, 12])
torch.Size([30, 4, 48]) torch.Size([30, 48, 4])
torch.Size([30, 4, 96]) torch.Size([30, 48, 8])
torch.Size([30, 8, 48]) torch.Size([30, 96, 4])
torch.Size([30, 4, 48]) torch.Size([30, 48, 12])
torch.Size([30, 4, 48]) torch.Size([30, 48, 4])
torch.Size([30, 4, 96]) torch.Size([30, 48, 8])
torch.Size([30, 8, 48]) torch.Size([30, 96, 4])
torch.Size([30, 4, 48]) torch.Size([30, 48, 12])
torch.Size([30, 4, 48]) torch.Size([30, 48, 4])
torch.Size([30, 4, 96]) torch.Size([30, 48, 8])
torch.Size([30, 8, 48]) torch.Size([30, 96, 4])
torch.Size([30, 4, 50]) torch.Size([30, 48, 2])
number of params: 1206684
Start training for 50 epochs
Epoch: [0]  [  0/195]  eta: 0:08:19  lr: 0.001000  loss: 4.3708 (4.3708)  time: 2.5628  data: 0.6601  max mem: 6870
Epoch: [0]  [ 10/195]  eta: 0:02:53  lr: 0.001000  loss: 4.4611 (4.4544)  time: 0.9393  data: 0.0601  max mem: 6883
Epoch: [0]  [ 20/195]  eta: 0:02:30  lr: 0.001000  loss: 4.4634 (4.4622)  time: 0.7763  data: 0.0001  max mem: 6883
Epoch: [0]  [ 30/195]  eta: 0:02:17  lr: 0.001000  loss: 4.4634 (4.4563)  time: 0.7781  data: 0.0002  max mem: 6883
Epoch: [0]  [ 40/195]  eta: 0:02:07  lr: 0.001000  loss: 4.4482 (4.4472)  time: 0.7808  data: 0.0002  max mem: 6883
Epoch: [0]  [ 50/195]  eta: 0:01:58  lr: 0.001000  loss: 4.4408 (4.4434)  time: 0.7817  data: 0.0002  max mem: 6883
Epoch: [0]  [ 60/195]  eta: 0:01:49  lr: 0.001000  loss: 4.4347 (4.4392)  time: 0.7836  data: 0.0002  max mem: 6883
Epoch: [0]  [ 70/195]  eta: 0:01:40  lr: 0.001000  loss: 4.4201 (4.4333)  time: 0.7859  data: 0.0002  max mem: 6883
Epoch: [0]  [ 80/195]  eta: 0:01:32  lr: 0.001000  loss: 4.4201 (4.4336)  time: 0.7882  data: 0.0002  max mem: 6883
Epoch: [0]  [ 90/195]  eta: 0:01:24  lr: 0.001000  loss: 4.4022 (4.4281)  time: 0.7894  data: 0.0002  max mem: 6883
Epoch: [0]  [100/195]  eta: 0:01:16  lr: 0.001000  loss: 4.4001 (4.4235)  time: 0.7885  data: 0.0002  max mem: 6883
Epoch: [0]  [110/195]  eta: 0:01:08  lr: 0.001000  loss: 4.4074 (4.4239)  time: 0.7888  data: 0.0001  max mem: 6883
Epoch: [0]  [120/195]  eta: 0:00:59  lr: 0.001000  loss: 4.4280 (4.4233)  time: 0.7893  data: 0.0002  max mem: 6883
Epoch: [0]  [130/195]  eta: 0:00:51  lr: 0.001000  loss: 4.4200 (4.4226)  time: 0.7896  data: 0.0002  max mem: 6883
Epoch: [0]  [140/195]  eta: 0:00:43  lr: 0.001000  loss: 4.3815 (4.4202)  time: 0.7907  data: 0.0002  max mem: 6883
Epoch: [0]  [150/195]  eta: 0:00:35  lr: 0.001000  loss: 4.3815 (4.4185)  time: 0.7917  data: 0.0002  max mem: 6883
Epoch: [0]  [160/195]  eta: 0:00:27  lr: 0.001000  loss: 4.4071 (4.4192)  time: 0.7916  data: 0.0002  max mem: 6883
Epoch: [0]  [170/195]  eta: 0:00:19  lr: 0.001000  loss: 4.4332 (4.4177)  time: 0.7919  data: 0.0002  max mem: 6883
Epoch: [0]  [180/195]  eta: 0:00:11  lr: 0.001000  loss: 4.3642 (4.4141)  time: 0.7923  data: 0.0002  max mem: 6883
Epoch: [0]  [190/195]  eta: 0:00:03  lr: 0.001000  loss: 4.3975 (4.4140)  time: 0.7919  data: 0.0001  max mem: 6883
Epoch: [0]  [194/195]  eta: 0:00:00  lr: 0.001000  loss: 4.3975 (4.4128)  time: 0.7918  data: 0.0001  max mem: 6883
Epoch: [0] Total time: 0:02:35 (0.7966 s / it)
Averaged stats: lr: 0.001000  loss: 4.3975 (4.4075)
Test:  [ 0/53]  eta: 0:00:54  loss: 3.8779 (3.8779)  acc1: 13.5417 (13.5417)  acc5: 37.5000 (37.5000)  time: 1.0261  data: 0.5937  max mem: 6883
Test:  [10/53]  eta: 0:00:17  loss: 3.9081 (3.9081)  acc1: 11.9792 (12.4527)  acc5: 34.8958 (35.5114)  time: 0.3988  data: 0.0542  max mem: 6883
Test:  [20/53]  eta: 0:00:12  loss: 3.8873 (3.8958)  acc1: 12.5000 (12.9216)  acc5: 34.8958 (35.5159)  time: 0.3343  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:08  loss: 3.8868 (3.9015)  acc1: 13.0208 (12.9368)  acc5: 35.9375 (35.5343)  time: 0.3270  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.8879 (3.8959)  acc1: 14.0625 (13.4782)  acc5: 35.4167 (35.7088)  time: 0.3215  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.8820 (3.8918)  acc1: 15.6250 (13.6029)  acc5: 35.4167 (35.7230)  time: 0.3222  data: 0.0001  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.8820 (3.8854)  acc1: 15.6250 (13.6900)  acc5: 36.4583 (35.8400)  time: 0.3082  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3360 s / it)
* Acc@1 13.690 Acc@5 35.840 loss 3.885
Accuracy of the network on the 10000 test images: 13.7%
Max accuracy: 13.69%
Epoch: [1]  [  0/195]  eta: 0:04:46  lr: 0.001000  loss: 4.2963 (4.2963)  time: 1.4706  data: 0.6706  max mem: 6883
Epoch: [1]  [ 10/195]  eta: 0:02:39  lr: 0.001000  loss: 4.3649 (4.3654)  time: 0.8602  data: 0.0612  max mem: 6883
Epoch: [1]  [ 20/195]  eta: 0:02:25  lr: 0.001000  loss: 4.3501 (4.3577)  time: 0.7988  data: 0.0003  max mem: 6883
Epoch: [1]  [ 30/195]  eta: 0:02:15  lr: 0.001000  loss: 4.3493 (4.3440)  time: 0.7995  data: 0.0003  max mem: 6883
Epoch: [1]  [ 40/195]  eta: 0:02:06  lr: 0.001000  loss: 4.3531 (4.3494)  time: 0.7979  data: 0.0003  max mem: 6883
Epoch: [1]  [ 50/195]  eta: 0:01:57  lr: 0.001000  loss: 4.3985 (4.3560)  time: 0.7957  data: 0.0003  max mem: 6883
Epoch: [1]  [ 60/195]  eta: 0:01:49  lr: 0.001000  loss: 4.3923 (4.3567)  time: 0.7958  data: 0.0003  max mem: 6883
Epoch: [1]  [ 70/195]  eta: 0:01:40  lr: 0.001000  loss: 4.3327 (4.3539)  time: 0.7952  data: 0.0003  max mem: 6883
Epoch: [1]  [ 80/195]  eta: 0:01:32  lr: 0.001000  loss: 4.3534 (4.3603)  time: 0.7949  data: 0.0003  max mem: 6883
Epoch: [1]  [ 90/195]  eta: 0:01:24  lr: 0.001000  loss: 4.4115 (4.3624)  time: 0.7952  data: 0.0002  max mem: 6883
Epoch: [1]  [100/195]  eta: 0:01:16  lr: 0.001000  loss: 4.3938 (4.3634)  time: 0.7950  data: 0.0002  max mem: 6883
Epoch: [1]  [110/195]  eta: 0:01:08  lr: 0.001000  loss: 4.3457 (4.3621)  time: 0.7954  data: 0.0002  max mem: 6883
Epoch: [1]  [120/195]  eta: 0:01:00  lr: 0.001000  loss: 4.3623 (4.3661)  time: 0.7964  data: 0.0003  max mem: 6883
Epoch: [1]  [130/195]  eta: 0:00:52  lr: 0.001000  loss: 4.4019 (4.3693)  time: 0.8061  data: 0.0003  max mem: 6883
Epoch: [1]  [140/195]  eta: 0:00:44  lr: 0.001000  loss: 4.3981 (4.3687)  time: 0.8059  data: 0.0003  max mem: 6883
Epoch: [1]  [150/195]  eta: 0:00:36  lr: 0.001000  loss: 4.3952 (4.3709)  time: 0.7964  data: 0.0002  max mem: 6883
Epoch: [1]  [160/195]  eta: 0:00:28  lr: 0.001000  loss: 4.3952 (4.3709)  time: 0.8000  data: 0.0002  max mem: 6883
Epoch: [1]  [170/195]  eta: 0:00:20  lr: 0.001000  loss: 4.3614 (4.3698)  time: 0.7996  data: 0.0002  max mem: 6883
Epoch: [1]  [180/195]  eta: 0:00:12  lr: 0.001000  loss: 4.3291 (4.3673)  time: 0.7973  data: 0.0002  max mem: 6883
Epoch: [1]  [190/195]  eta: 0:00:04  lr: 0.001000  loss: 4.3562 (4.3673)  time: 0.7969  data: 0.0002  max mem: 6883
Epoch: [1]  [194/195]  eta: 0:00:00  lr: 0.001000  loss: 4.3562 (4.3664)  time: 0.7964  data: 0.0002  max mem: 6883
Epoch: [1] Total time: 0:02:36 (0.8018 s / it)
Averaged stats: lr: 0.001000  loss: 4.3562 (4.3682)
Test:  [ 0/53]  eta: 0:00:51  loss: 3.7555 (3.7555)  acc1: 17.1875 (17.1875)  acc5: 42.7083 (42.7083)  time: 0.9749  data: 0.6568  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 3.8183 (3.8037)  acc1: 14.0625 (13.9205)  acc5: 36.9792 (38.4943)  time: 0.3814  data: 0.0600  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 3.7636 (3.7880)  acc1: 15.1042 (14.8065)  acc5: 37.5000 (39.1121)  time: 0.3219  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 3.7813 (3.7910)  acc1: 15.1042 (14.4825)  acc5: 38.5417 (39.0457)  time: 0.3228  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.8030 (3.7910)  acc1: 13.5417 (14.3420)  acc5: 38.0208 (39.1387)  time: 0.3228  data: 0.0003  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.7807 (3.7868)  acc1: 13.5417 (14.2259)  acc5: 38.0208 (38.9604)  time: 0.3235  data: 0.0001  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.7574 (3.7805)  acc1: 13.5417 (14.2600)  acc5: 39.0625 (39.0900)  time: 0.3089  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3314 s / it)
* Acc@1 14.260 Acc@5 39.090 loss 3.780
Accuracy of the network on the 10000 test images: 14.3%
Max accuracy: 14.26%
Epoch: [2]  [  0/195]  eta: 0:04:43  lr: 0.000900  loss: 4.4906 (4.4906)  time: 1.4544  data: 0.6579  max mem: 6883
Epoch: [2]  [ 10/195]  eta: 0:02:38  lr: 0.000900  loss: 4.4077 (4.4199)  time: 0.8570  data: 0.0600  max mem: 6883
Epoch: [2]  [ 20/195]  eta: 0:02:24  lr: 0.000900  loss: 4.3984 (4.4122)  time: 0.7955  data: 0.0003  max mem: 6883
Epoch: [2]  [ 30/195]  eta: 0:02:14  lr: 0.000900  loss: 4.3865 (4.3947)  time: 0.7949  data: 0.0003  max mem: 6883
Epoch: [2]  [ 40/195]  eta: 0:02:05  lr: 0.000900  loss: 4.3575 (4.3905)  time: 0.7949  data: 0.0003  max mem: 6883
Epoch: [2]  [ 50/195]  eta: 0:01:57  lr: 0.000900  loss: 4.3924 (4.3912)  time: 0.7957  data: 0.0003  max mem: 6883
Epoch: [2]  [ 60/195]  eta: 0:01:48  lr: 0.000900  loss: 4.4004 (4.3892)  time: 0.7959  data: 0.0002  max mem: 6883
Epoch: [2]  [ 70/195]  eta: 0:01:40  lr: 0.000900  loss: 4.3238 (4.3787)  time: 0.7942  data: 0.0003  max mem: 6883
Epoch: [2]  [ 80/195]  eta: 0:01:32  lr: 0.000900  loss: 4.3097 (4.3681)  time: 0.7947  data: 0.0003  max mem: 6883
Epoch: [2]  [ 90/195]  eta: 0:01:24  lr: 0.000900  loss: 4.3109 (4.3653)  time: 0.7951  data: 0.0003  max mem: 6883
Epoch: [2]  [100/195]  eta: 0:01:16  lr: 0.000900  loss: 4.3406 (4.3682)  time: 0.7944  data: 0.0003  max mem: 6883
Epoch: [2]  [110/195]  eta: 0:01:08  lr: 0.000900  loss: 4.3760 (4.3647)  time: 0.7953  data: 0.0002  max mem: 6883
Epoch: [2]  [120/195]  eta: 0:01:00  lr: 0.000900  loss: 4.3852 (4.3697)  time: 0.7964  data: 0.0002  max mem: 6883
Epoch: [2]  [130/195]  eta: 0:00:52  lr: 0.000900  loss: 4.4052 (4.3704)  time: 0.7949  data: 0.0003  max mem: 6883
Epoch: [2]  [140/195]  eta: 0:00:43  lr: 0.000900  loss: 4.3985 (4.3676)  time: 0.7945  data: 0.0002  max mem: 6883
Epoch: [2]  [150/195]  eta: 0:00:35  lr: 0.000900  loss: 4.3560 (4.3668)  time: 0.7954  data: 0.0003  max mem: 6883
Epoch: [2]  [160/195]  eta: 0:00:27  lr: 0.000900  loss: 4.3560 (4.3651)  time: 0.7944  data: 0.0003  max mem: 6883
Epoch: [2]  [170/195]  eta: 0:00:19  lr: 0.000900  loss: 4.4262 (4.3685)  time: 0.7945  data: 0.0002  max mem: 6883
Epoch: [2]  [180/195]  eta: 0:00:11  lr: 0.000900  loss: 4.4024 (4.3699)  time: 0.7957  data: 0.0002  max mem: 6883
Epoch: [2]  [190/195]  eta: 0:00:03  lr: 0.000900  loss: 4.3868 (4.3699)  time: 0.7954  data: 0.0002  max mem: 6883
Epoch: [2]  [194/195]  eta: 0:00:00  lr: 0.000900  loss: 4.4024 (4.3702)  time: 0.7956  data: 0.0001  max mem: 6883
Epoch: [2] Total time: 0:02:35 (0.7991 s / it)
Averaged stats: lr: 0.000900  loss: 4.4024 (4.3683)
Test:  [ 0/53]  eta: 0:00:46  loss: 3.6905 (3.6905)  acc1: 19.2708 (19.2708)  acc5: 41.6667 (41.6667)  time: 0.8729  data: 0.5545  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 3.6942 (3.6982)  acc1: 16.1458 (16.1458)  acc5: 38.5417 (39.6780)  time: 0.3739  data: 0.0508  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 3.6922 (3.6927)  acc1: 15.6250 (15.9474)  acc5: 39.5833 (40.0298)  time: 0.3227  data: 0.0004  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 3.6877 (3.6942)  acc1: 15.1042 (15.6250)  acc5: 40.6250 (40.5578)  time: 0.3223  data: 0.0004  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.7019 (3.6974)  acc1: 15.1042 (15.4345)  acc5: 40.6250 (40.4853)  time: 0.3231  data: 0.0004  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.7019 (3.6955)  acc1: 15.1042 (15.4922)  acc5: 40.1042 (40.3799)  time: 0.3242  data: 0.0003  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.6799 (3.6887)  acc1: 15.6250 (15.5300)  acc5: 41.1458 (40.4100)  time: 0.3093  data: 0.0002  max mem: 6883
Test: Total time: 0:00:17 (0.3298 s / it)
* Acc@1 15.530 Acc@5 40.410 loss 3.689
Accuracy of the network on the 10000 test images: 15.5%
Max accuracy: 15.53%
Epoch: [3]  [  0/195]  eta: 0:05:18  lr: 0.000800  loss: 4.4173 (4.4173)  time: 1.6316  data: 0.8347  max mem: 6883
Epoch: [3]  [ 10/195]  eta: 0:02:41  lr: 0.000800  loss: 4.3481 (4.3457)  time: 0.8736  data: 0.0761  max mem: 6883
Epoch: [3]  [ 20/195]  eta: 0:02:26  lr: 0.000800  loss: 4.3456 (4.3423)  time: 0.7967  data: 0.0003  max mem: 6883
Epoch: [3]  [ 30/195]  eta: 0:02:15  lr: 0.000800  loss: 4.3551 (4.3492)  time: 0.7968  data: 0.0003  max mem: 6883
Epoch: [3]  [ 40/195]  eta: 0:02:06  lr: 0.000800  loss: 4.3765 (4.3659)  time: 0.7974  data: 0.0003  max mem: 6883
Epoch: [3]  [ 50/195]  eta: 0:01:57  lr: 0.000800  loss: 4.3873 (4.3632)  time: 0.7958  data: 0.0003  max mem: 6883
Epoch: [3]  [ 60/195]  eta: 0:01:49  lr: 0.000800  loss: 4.3445 (4.3531)  time: 0.7951  data: 0.0003  max mem: 6883
Epoch: [3]  [ 70/195]  eta: 0:01:41  lr: 0.000800  loss: 4.3210 (4.3504)  time: 0.7962  data: 0.0003  max mem: 6883
Epoch: [3]  [ 80/195]  eta: 0:01:32  lr: 0.000800  loss: 4.2980 (4.3457)  time: 0.7962  data: 0.0002  max mem: 6883
Epoch: [3]  [ 90/195]  eta: 0:01:24  lr: 0.000800  loss: 4.3720 (4.3478)  time: 0.7966  data: 0.0003  max mem: 6883
Epoch: [3]  [100/195]  eta: 0:01:16  lr: 0.000800  loss: 4.3720 (4.3489)  time: 0.7961  data: 0.0003  max mem: 6883
Epoch: [3]  [110/195]  eta: 0:01:08  lr: 0.000800  loss: 4.3610 (4.3489)  time: 0.7955  data: 0.0003  max mem: 6883
Epoch: [3]  [120/195]  eta: 0:01:00  lr: 0.000800  loss: 4.3925 (4.3467)  time: 0.7964  data: 0.0003  max mem: 6883
Epoch: [3]  [130/195]  eta: 0:00:52  lr: 0.000800  loss: 4.3570 (4.3411)  time: 0.7965  data: 0.0003  max mem: 6883
Epoch: [3]  [140/195]  eta: 0:00:44  lr: 0.000800  loss: 4.3342 (4.3398)  time: 0.7959  data: 0.0003  max mem: 6883
Epoch: [3]  [150/195]  eta: 0:00:36  lr: 0.000800  loss: 4.3461 (4.3414)  time: 0.7956  data: 0.0002  max mem: 6883
Epoch: [3]  [160/195]  eta: 0:00:28  lr: 0.000800  loss: 4.3963 (4.3442)  time: 0.7955  data: 0.0002  max mem: 6883
Epoch: [3]  [170/195]  eta: 0:00:20  lr: 0.000800  loss: 4.3789 (4.3425)  time: 0.7950  data: 0.0003  max mem: 6883
Epoch: [3]  [180/195]  eta: 0:00:12  lr: 0.000800  loss: 4.3660 (4.3446)  time: 0.7964  data: 0.0004  max mem: 6883
Epoch: [3]  [190/195]  eta: 0:00:04  lr: 0.000800  loss: 4.3853 (4.3467)  time: 0.7952  data: 0.0003  max mem: 6883
Epoch: [3]  [194/195]  eta: 0:00:00  lr: 0.000800  loss: 4.3853 (4.3468)  time: 0.7947  data: 0.0003  max mem: 6883
Epoch: [3] Total time: 0:02:36 (0.8009 s / it)
Averaged stats: lr: 0.000800  loss: 4.3853 (4.3482)
Test:  [ 0/53]  eta: 0:00:52  loss: 3.7096 (3.7096)  acc1: 17.7083 (17.7083)  acc5: 40.6250 (40.6250)  time: 0.9943  data: 0.6759  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 3.7096 (3.7107)  acc1: 14.0625 (14.2992)  acc5: 40.6250 (40.9091)  time: 0.3832  data: 0.0617  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 3.6876 (3.6957)  acc1: 14.0625 (14.4841)  acc5: 41.6667 (41.3194)  time: 0.3217  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 3.6777 (3.6929)  acc1: 14.5833 (14.9698)  acc5: 41.6667 (41.4315)  time: 0.3215  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.6793 (3.6912)  acc1: 16.1458 (15.2185)  acc5: 41.6667 (41.5142)  time: 0.3216  data: 0.0003  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.6651 (3.6901)  acc1: 15.1042 (15.1144)  acc5: 41.1458 (41.2888)  time: 0.3228  data: 0.0002  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.6622 (3.6846)  acc1: 15.6250 (15.1300)  acc5: 41.1458 (41.3900)  time: 0.3082  data: 0.0002  max mem: 6883
Test: Total time: 0:00:17 (0.3310 s / it)
* Acc@1 15.130 Acc@5 41.390 loss 3.685
Accuracy of the network on the 10000 test images: 15.1%
Max accuracy: 15.53%
Epoch: [4]  [  0/195]  eta: 0:05:06  lr: 0.000700  loss: 4.3920 (4.3920)  time: 1.5699  data: 0.7704  max mem: 6883
Epoch: [4]  [ 10/195]  eta: 0:02:40  lr: 0.000700  loss: 4.2897 (4.3157)  time: 0.8667  data: 0.0702  max mem: 6883
Epoch: [4]  [ 20/195]  eta: 0:02:25  lr: 0.000700  loss: 4.3439 (4.3359)  time: 0.7967  data: 0.0003  max mem: 6883
Epoch: [4]  [ 30/195]  eta: 0:02:15  lr: 0.000700  loss: 4.3035 (4.3018)  time: 0.7972  data: 0.0003  max mem: 6883
Epoch: [4]  [ 40/195]  eta: 0:02:06  lr: 0.000700  loss: 4.2739 (4.3171)  time: 0.7963  data: 0.0003  max mem: 6883
Epoch: [4]  [ 50/195]  eta: 0:01:57  lr: 0.000700  loss: 4.3020 (4.3093)  time: 0.8003  data: 0.0003  max mem: 6883
Epoch: [4]  [ 60/195]  eta: 0:01:49  lr: 0.000700  loss: 4.2959 (4.3107)  time: 0.8007  data: 0.0002  max mem: 6883
Epoch: [4]  [ 70/195]  eta: 0:01:41  lr: 0.000700  loss: 4.2972 (4.3123)  time: 0.7963  data: 0.0003  max mem: 6883
Epoch: [4]  [ 80/195]  eta: 0:01:32  lr: 0.000700  loss: 4.2874 (4.3082)  time: 0.7958  data: 0.0003  max mem: 6883
Epoch: [4]  [ 90/195]  eta: 0:01:24  lr: 0.000700  loss: 4.3073 (4.3114)  time: 0.7961  data: 0.0002  max mem: 6883
Epoch: [4]  [100/195]  eta: 0:01:16  lr: 0.000700  loss: 4.3069 (4.3107)  time: 0.7963  data: 0.0002  max mem: 6883
Epoch: [4]  [110/195]  eta: 0:01:08  lr: 0.000700  loss: 4.2514 (4.3117)  time: 0.7965  data: 0.0002  max mem: 6883
Epoch: [4]  [120/195]  eta: 0:01:00  lr: 0.000700  loss: 4.2673 (4.3093)  time: 0.7981  data: 0.0003  max mem: 6883
Epoch: [4]  [130/195]  eta: 0:00:52  lr: 0.000700  loss: 4.3296 (4.3100)  time: 0.7985  data: 0.0003  max mem: 6883
Epoch: [4]  [140/195]  eta: 0:00:44  lr: 0.000700  loss: 4.3592 (4.3142)  time: 0.7966  data: 0.0003  max mem: 6883
Epoch: [4]  [150/195]  eta: 0:00:36  lr: 0.000700  loss: 4.3945 (4.3195)  time: 0.7953  data: 0.0003  max mem: 6883
Epoch: [4]  [160/195]  eta: 0:00:28  lr: 0.000700  loss: 4.3847 (4.3187)  time: 0.7953  data: 0.0003  max mem: 6883
Epoch: [4]  [170/195]  eta: 0:00:20  lr: 0.000700  loss: 4.2450 (4.3141)  time: 0.7949  data: 0.0003  max mem: 6883
Epoch: [4]  [180/195]  eta: 0:00:12  lr: 0.000700  loss: 4.2695 (4.3118)  time: 0.7959  data: 0.0003  max mem: 6883
Epoch: [4]  [190/195]  eta: 0:00:04  lr: 0.000700  loss: 4.3525 (4.3151)  time: 0.7947  data: 0.0002  max mem: 6883
Epoch: [4]  [194/195]  eta: 0:00:00  lr: 0.000700  loss: 4.3651 (4.3167)  time: 0.7949  data: 0.0002  max mem: 6883
Epoch: [4] Total time: 0:02:36 (0.8012 s / it)
Averaged stats: lr: 0.000700  loss: 4.3651 (4.3143)
Test:  [ 0/53]  eta: 0:00:51  loss: 3.5895 (3.5895)  acc1: 20.8333 (20.8333)  acc5: 45.3125 (45.3125)  time: 0.9782  data: 0.6605  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 3.6062 (3.6135)  acc1: 17.7083 (17.7083)  acc5: 43.7500 (45.0284)  time: 0.3819  data: 0.0604  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 3.6028 (3.6060)  acc1: 18.2292 (18.7500)  acc5: 44.2708 (45.4365)  time: 0.3219  data: 0.0004  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 3.6022 (3.6073)  acc1: 18.7500 (18.8844)  acc5: 45.3125 (45.5141)  time: 0.3215  data: 0.0004  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.6113 (3.6128)  acc1: 18.7500 (18.7246)  acc5: 44.7917 (45.2617)  time: 0.3213  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.6139 (3.6145)  acc1: 18.2292 (18.4334)  acc5: 43.7500 (45.1593)  time: 0.3223  data: 0.0002  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.6030 (3.6106)  acc1: 18.2292 (18.4900)  acc5: 44.2708 (45.2200)  time: 0.3078  data: 0.0002  max mem: 6883
Test: Total time: 0:00:17 (0.3304 s / it)
* Acc@1 18.490 Acc@5 45.220 loss 3.611
Accuracy of the network on the 10000 test images: 18.5%
Max accuracy: 18.49%
Epoch: [5]  [  0/195]  eta: 0:04:57  lr: 0.000600  loss: 4.5349 (4.5349)  time: 1.5250  data: 0.7292  max mem: 6883
Epoch: [5]  [ 10/195]  eta: 0:02:39  lr: 0.000600  loss: 4.3571 (4.3079)  time: 0.8626  data: 0.0665  max mem: 6883
Epoch: [5]  [ 20/195]  eta: 0:02:25  lr: 0.000600  loss: 4.2951 (4.3163)  time: 0.7946  data: 0.0003  max mem: 6883
Epoch: [5]  [ 30/195]  eta: 0:02:15  lr: 0.000600  loss: 4.2911 (4.3082)  time: 0.7954  data: 0.0002  max mem: 6883
Epoch: [5]  [ 40/195]  eta: 0:02:06  lr: 0.000600  loss: 4.3298 (4.3262)  time: 0.7957  data: 0.0002  max mem: 6883
Epoch: [5]  [ 50/195]  eta: 0:01:57  lr: 0.000600  loss: 4.3394 (4.3189)  time: 0.7951  data: 0.0002  max mem: 6883
Epoch: [5]  [ 60/195]  eta: 0:01:48  lr: 0.000600  loss: 4.3069 (4.3197)  time: 0.7955  data: 0.0002  max mem: 6883
Epoch: [5]  [ 70/195]  eta: 0:01:40  lr: 0.000600  loss: 4.3251 (4.3204)  time: 0.7944  data: 0.0002  max mem: 6883
Epoch: [5]  [ 80/195]  eta: 0:01:32  lr: 0.000600  loss: 4.3262 (4.3221)  time: 0.7932  data: 0.0002  max mem: 6883
Epoch: [5]  [ 90/195]  eta: 0:01:24  lr: 0.000600  loss: 4.3630 (4.3259)  time: 0.7940  data: 0.0002  max mem: 6883
Epoch: [5]  [100/195]  eta: 0:01:16  lr: 0.000600  loss: 4.3335 (4.3204)  time: 0.7958  data: 0.0003  max mem: 6883
Epoch: [5]  [110/195]  eta: 0:01:08  lr: 0.000600  loss: 4.3202 (4.3204)  time: 0.7950  data: 0.0003  max mem: 6883
Epoch: [5]  [120/195]  eta: 0:01:00  lr: 0.000600  loss: 4.3039 (4.3167)  time: 0.7955  data: 0.0003  max mem: 6883
Epoch: [5]  [130/195]  eta: 0:00:52  lr: 0.000600  loss: 4.3016 (4.3177)  time: 0.7955  data: 0.0003  max mem: 6883
Epoch: [5]  [140/195]  eta: 0:00:44  lr: 0.000600  loss: 4.3243 (4.3163)  time: 0.7969  data: 0.0003  max mem: 6883
Epoch: [5]  [150/195]  eta: 0:00:36  lr: 0.000600  loss: 4.3048 (4.3155)  time: 0.7977  data: 0.0003  max mem: 6883
Epoch: [5]  [160/195]  eta: 0:00:28  lr: 0.000600  loss: 4.3120 (4.3150)  time: 0.7970  data: 0.0002  max mem: 6883
Epoch: [5]  [170/195]  eta: 0:00:19  lr: 0.000600  loss: 4.3234 (4.3153)  time: 0.7970  data: 0.0003  max mem: 6883
Epoch: [5]  [180/195]  eta: 0:00:11  lr: 0.000600  loss: 4.3381 (4.3160)  time: 0.7957  data: 0.0003  max mem: 6883
Epoch: [5]  [190/195]  eta: 0:00:03  lr: 0.000600  loss: 4.3381 (4.3141)  time: 0.7948  data: 0.0002  max mem: 6883
Epoch: [5]  [194/195]  eta: 0:00:00  lr: 0.000600  loss: 4.3438 (4.3153)  time: 0.7953  data: 0.0002  max mem: 6883
Epoch: [5] Total time: 0:02:35 (0.7998 s / it)
Averaged stats: lr: 0.000600  loss: 4.3438 (4.3091)
Test:  [ 0/53]  eta: 0:00:49  loss: 3.4457 (3.4457)  acc1: 24.4792 (24.4792)  acc5: 48.4375 (48.4375)  time: 0.9388  data: 0.6192  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 3.5283 (3.5329)  acc1: 19.7917 (19.3182)  acc5: 44.7917 (45.6439)  time: 0.3788  data: 0.0566  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 3.5136 (3.5214)  acc1: 19.7917 (19.6925)  acc5: 46.3542 (46.7262)  time: 0.3220  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 3.5136 (3.5217)  acc1: 19.7917 (19.8085)  acc5: 46.3542 (46.4382)  time: 0.3223  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.5209 (3.5230)  acc1: 19.7917 (19.8298)  acc5: 45.3125 (46.1382)  time: 0.3223  data: 0.0004  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.5125 (3.5229)  acc1: 18.7500 (19.4547)  acc5: 46.3542 (46.3031)  time: 0.3219  data: 0.0002  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.5061 (3.5180)  acc1: 18.7500 (19.5300)  acc5: 46.8750 (46.3900)  time: 0.3077  data: 0.0002  max mem: 6883
Test: Total time: 0:00:17 (0.3301 s / it)
* Acc@1 19.530 Acc@5 46.390 loss 3.518
Accuracy of the network on the 10000 test images: 19.5%
Max accuracy: 19.53%
Epoch: [6]  [  0/195]  eta: 0:05:13  lr: 0.000490  loss: 4.0899 (4.0899)  time: 1.6082  data: 0.8146  max mem: 6883
Epoch: [6]  [ 10/195]  eta: 0:02:40  lr: 0.000490  loss: 4.3102 (4.3140)  time: 0.8697  data: 0.0743  max mem: 6883
Epoch: [6]  [ 20/195]  eta: 0:02:25  lr: 0.000490  loss: 4.3136 (4.3181)  time: 0.7955  data: 0.0003  max mem: 6883
Epoch: [6]  [ 30/195]  eta: 0:02:15  lr: 0.000490  loss: 4.3267 (4.3189)  time: 0.7959  data: 0.0003  max mem: 6883
Epoch: [6]  [ 40/195]  eta: 0:02:06  lr: 0.000490  loss: 4.3081 (4.3071)  time: 0.7962  data: 0.0003  max mem: 6883
Epoch: [6]  [ 50/195]  eta: 0:01:57  lr: 0.000490  loss: 4.2856 (4.2929)  time: 0.7964  data: 0.0003  max mem: 6883
Epoch: [6]  [ 60/195]  eta: 0:01:49  lr: 0.000490  loss: 4.2968 (4.3076)  time: 0.7958  data: 0.0003  max mem: 6883
Epoch: [6]  [ 70/195]  eta: 0:01:40  lr: 0.000490  loss: 4.3320 (4.3077)  time: 0.7968  data: 0.0003  max mem: 6883
Epoch: [6]  [ 80/195]  eta: 0:01:32  lr: 0.000490  loss: 4.3148 (4.3020)  time: 0.7969  data: 0.0003  max mem: 6883
Epoch: [6]  [ 90/195]  eta: 0:01:24  lr: 0.000490  loss: 4.3172 (4.3062)  time: 0.7951  data: 0.0002  max mem: 6883
Epoch: [6]  [100/195]  eta: 0:01:16  lr: 0.000490  loss: 4.3237 (4.3092)  time: 0.7962  data: 0.0003  max mem: 6883
Epoch: [6]  [110/195]  eta: 0:01:08  lr: 0.000490  loss: 4.3462 (4.3103)  time: 0.8000  data: 0.0003  max mem: 6883
Epoch: [6]  [120/195]  eta: 0:01:00  lr: 0.000490  loss: 4.3298 (4.3144)  time: 0.7993  data: 0.0003  max mem: 6883
Epoch: [6]  [130/195]  eta: 0:00:52  lr: 0.000490  loss: 4.3180 (4.3127)  time: 0.7995  data: 0.0002  max mem: 6883
Epoch: [6]  [140/195]  eta: 0:00:44  lr: 0.000490  loss: 4.2829 (4.3096)  time: 0.7999  data: 0.0003  max mem: 6883
Epoch: [6]  [150/195]  eta: 0:00:36  lr: 0.000490  loss: 4.2498 (4.3042)  time: 0.7957  data: 0.0002  max mem: 6883
Epoch: [6]  [160/195]  eta: 0:00:28  lr: 0.000490  loss: 4.2800 (4.3012)  time: 0.7951  data: 0.0002  max mem: 6883
Epoch: [6]  [170/195]  eta: 0:00:20  lr: 0.000490  loss: 4.3136 (4.3011)  time: 0.7947  data: 0.0003  max mem: 6883
Epoch: [6]  [180/195]  eta: 0:00:12  lr: 0.000490  loss: 4.3324 (4.3008)  time: 0.7942  data: 0.0003  max mem: 6883
Epoch: [6]  [190/195]  eta: 0:00:04  lr: 0.000490  loss: 4.3592 (4.3053)  time: 0.7941  data: 0.0002  max mem: 6883
Epoch: [6]  [194/195]  eta: 0:00:00  lr: 0.000490  loss: 4.3803 (4.3068)  time: 0.7947  data: 0.0002  max mem: 6883
Epoch: [6] Total time: 0:02:36 (0.8012 s / it)
Averaged stats: lr: 0.000490  loss: 4.3803 (4.3027)
Test:  [ 0/53]  eta: 0:00:47  loss: 3.5356 (3.5356)  acc1: 22.9167 (22.9167)  acc5: 51.5625 (51.5625)  time: 0.8885  data: 0.5699  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 3.5630 (3.5798)  acc1: 20.3125 (20.0284)  acc5: 47.3958 (47.3011)  time: 0.3746  data: 0.0521  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 3.5559 (3.5697)  acc1: 18.7500 (19.7669)  acc5: 47.9167 (48.0159)  time: 0.3222  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 3.5649 (3.5706)  acc1: 19.2708 (19.4724)  acc5: 48.9583 (48.0847)  time: 0.3218  data: 0.0004  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.5654 (3.5692)  acc1: 20.3125 (19.7917)  acc5: 47.3958 (48.1199)  time: 0.3218  data: 0.0003  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.5576 (3.5684)  acc1: 19.7917 (19.7610)  acc5: 47.9167 (48.1413)  time: 0.3235  data: 0.0002  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.5397 (3.5620)  acc1: 20.3125 (19.8000)  acc5: 48.4375 (48.2100)  time: 0.3087  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3298 s / it)
* Acc@1 19.800 Acc@5 48.210 loss 3.562
Accuracy of the network on the 10000 test images: 19.8%
Max accuracy: 19.80%
Epoch: [7]  [  0/195]  eta: 0:04:44  lr: 0.000486  loss: 4.3562 (4.3562)  time: 1.4594  data: 0.6614  max mem: 6883
Epoch: [7]  [ 10/195]  eta: 0:02:38  lr: 0.000486  loss: 4.3259 (4.3130)  time: 0.8564  data: 0.0603  max mem: 6883
Epoch: [7]  [ 20/195]  eta: 0:02:24  lr: 0.000486  loss: 4.3227 (4.3076)  time: 0.7942  data: 0.0002  max mem: 6883
Epoch: [7]  [ 30/195]  eta: 0:02:14  lr: 0.000486  loss: 4.3175 (4.2911)  time: 0.7943  data: 0.0002  max mem: 6883
Epoch: [7]  [ 40/195]  eta: 0:02:05  lr: 0.000486  loss: 4.3335 (4.3073)  time: 0.7952  data: 0.0003  max mem: 6883
Epoch: [7]  [ 50/195]  eta: 0:01:57  lr: 0.000486  loss: 4.3374 (4.3048)  time: 0.7944  data: 0.0003  max mem: 6883
Epoch: [7]  [ 60/195]  eta: 0:01:48  lr: 0.000486  loss: 4.2930 (4.3059)  time: 0.7937  data: 0.0002  max mem: 6883
Epoch: [7]  [ 70/195]  eta: 0:01:40  lr: 0.000486  loss: 4.3029 (4.3029)  time: 0.7953  data: 0.0002  max mem: 6883
Epoch: [7]  [ 80/195]  eta: 0:01:32  lr: 0.000486  loss: 4.3097 (4.3018)  time: 0.7951  data: 0.0002  max mem: 6883
Epoch: [7]  [ 90/195]  eta: 0:01:24  lr: 0.000486  loss: 4.2736 (4.2983)  time: 0.7949  data: 0.0002  max mem: 6883
Epoch: [7]  [100/195]  eta: 0:01:16  lr: 0.000486  loss: 4.2848 (4.3026)  time: 0.7962  data: 0.0002  max mem: 6883
Epoch: [7]  [110/195]  eta: 0:01:08  lr: 0.000486  loss: 4.3484 (4.2974)  time: 0.7952  data: 0.0003  max mem: 6883
Epoch: [7]  [120/195]  eta: 0:01:00  lr: 0.000486  loss: 4.2487 (4.2894)  time: 0.7955  data: 0.0003  max mem: 6883
Epoch: [7]  [130/195]  eta: 0:00:52  lr: 0.000486  loss: 4.1442 (4.2812)  time: 0.7963  data: 0.0003  max mem: 6883
Epoch: [7]  [140/195]  eta: 0:00:44  lr: 0.000486  loss: 4.2282 (4.2825)  time: 0.7975  data: 0.0003  max mem: 6883
Epoch: [7]  [150/195]  eta: 0:00:35  lr: 0.000486  loss: 4.2871 (4.2849)  time: 0.7967  data: 0.0003  max mem: 6883
Epoch: [7]  [160/195]  eta: 0:00:27  lr: 0.000486  loss: 4.3149 (4.2867)  time: 0.7962  data: 0.0003  max mem: 6883
Epoch: [7]  [170/195]  eta: 0:00:19  lr: 0.000486  loss: 4.2857 (4.2868)  time: 0.7971  data: 0.0003  max mem: 6883
Epoch: [7]  [180/195]  eta: 0:00:11  lr: 0.000486  loss: 4.2450 (4.2855)  time: 0.7972  data: 0.0003  max mem: 6883
Epoch: [7]  [190/195]  eta: 0:00:03  lr: 0.000486  loss: 4.2775 (4.2859)  time: 0.7970  data: 0.0002  max mem: 6883
Epoch: [7]  [194/195]  eta: 0:00:00  lr: 0.000486  loss: 4.2677 (4.2859)  time: 0.7975  data: 0.0001  max mem: 6883
Epoch: [7] Total time: 0:02:35 (0.7997 s / it)
Averaged stats: lr: 0.000486  loss: 4.2677 (4.2870)
Test:  [ 0/53]  eta: 0:00:55  loss: 3.4164 (3.4164)  acc1: 26.5625 (26.5625)  acc5: 51.0417 (51.0417)  time: 1.0521  data: 0.7326  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 3.4636 (3.4821)  acc1: 20.8333 (20.3125)  acc5: 48.9583 (49.0530)  time: 0.3885  data: 0.0669  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 3.4636 (3.4750)  acc1: 19.7917 (20.0149)  acc5: 48.9583 (48.8591)  time: 0.3217  data: 0.0004  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 3.4528 (3.4736)  acc1: 19.7917 (20.4301)  acc5: 48.9583 (48.9079)  time: 0.3218  data: 0.0004  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.4758 (3.4797)  acc1: 20.8333 (20.4776)  acc5: 48.9583 (48.7424)  time: 0.3220  data: 0.0003  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.4737 (3.4785)  acc1: 19.7917 (20.2819)  acc5: 48.9583 (48.7643)  time: 0.3218  data: 0.0002  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.4600 (3.4724)  acc1: 19.7917 (20.3200)  acc5: 48.9583 (48.8000)  time: 0.3072  data: 0.0002  max mem: 6883
Test: Total time: 0:00:17 (0.3319 s / it)
* Acc@1 20.320 Acc@5 48.800 loss 3.472
Accuracy of the network on the 10000 test images: 20.3%
Max accuracy: 20.32%
Epoch: [8]  [  0/195]  eta: 0:04:25  lr: 0.000481  loss: 4.0436 (4.0436)  time: 1.3597  data: 0.5603  max mem: 6883
Epoch: [8]  [ 10/195]  eta: 0:02:37  lr: 0.000481  loss: 4.1760 (4.1552)  time: 0.8489  data: 0.0512  max mem: 6883
Epoch: [8]  [ 20/195]  eta: 0:02:23  lr: 0.000481  loss: 4.2264 (4.2221)  time: 0.7958  data: 0.0003  max mem: 6883
Epoch: [8]  [ 30/195]  eta: 0:02:14  lr: 0.000481  loss: 4.2855 (4.2379)  time: 0.7959  data: 0.0003  max mem: 6883
Epoch: [8]  [ 40/195]  eta: 0:02:05  lr: 0.000481  loss: 4.3112 (4.2655)  time: 0.7958  data: 0.0002  max mem: 6883
Epoch: [8]  [ 50/195]  eta: 0:01:57  lr: 0.000481  loss: 4.3304 (4.2746)  time: 0.7948  data: 0.0002  max mem: 6883
Epoch: [8]  [ 60/195]  eta: 0:01:48  lr: 0.000481  loss: 4.2770 (4.2720)  time: 0.7958  data: 0.0002  max mem: 6883
Epoch: [8]  [ 70/195]  eta: 0:01:40  lr: 0.000481  loss: 4.2491 (4.2750)  time: 0.7956  data: 0.0002  max mem: 6883
Epoch: [8]  [ 80/195]  eta: 0:01:32  lr: 0.000481  loss: 4.2757 (4.2746)  time: 0.7956  data: 0.0003  max mem: 6883
Epoch: [8]  [ 90/195]  eta: 0:01:24  lr: 0.000481  loss: 4.1917 (4.2628)  time: 0.7963  data: 0.0003  max mem: 6883
Epoch: [8]  [100/195]  eta: 0:01:16  lr: 0.000481  loss: 4.1917 (4.2594)  time: 0.7964  data: 0.0003  max mem: 6883
Epoch: [8]  [110/195]  eta: 0:01:08  lr: 0.000481  loss: 4.2396 (4.2566)  time: 0.7946  data: 0.0002  max mem: 6883
Epoch: [8]  [120/195]  eta: 0:01:00  lr: 0.000481  loss: 4.2721 (4.2603)  time: 0.7943  data: 0.0002  max mem: 6883
Epoch: [8]  [130/195]  eta: 0:00:51  lr: 0.000481  loss: 4.2800 (4.2632)  time: 0.7943  data: 0.0002  max mem: 6883
Epoch: [8]  [140/195]  eta: 0:00:43  lr: 0.000481  loss: 4.3548 (4.2706)  time: 0.7937  data: 0.0002  max mem: 6883
Epoch: [8]  [150/195]  eta: 0:00:35  lr: 0.000481  loss: 4.3296 (4.2696)  time: 0.7946  data: 0.0002  max mem: 6883
Epoch: [8]  [160/195]  eta: 0:00:27  lr: 0.000481  loss: 4.2491 (4.2694)  time: 0.7947  data: 0.0002  max mem: 6883
Epoch: [8]  [170/195]  eta: 0:00:19  lr: 0.000481  loss: 4.3437 (4.2739)  time: 0.7939  data: 0.0002  max mem: 6883
Epoch: [8]  [180/195]  eta: 0:00:11  lr: 0.000481  loss: 4.3437 (4.2730)  time: 0.7942  data: 0.0002  max mem: 6883
Epoch: [8]  [190/195]  eta: 0:00:03  lr: 0.000481  loss: 4.2463 (4.2710)  time: 0.7941  data: 0.0001  max mem: 6883
Epoch: [8]  [194/195]  eta: 0:00:00  lr: 0.000481  loss: 4.2463 (4.2692)  time: 0.7945  data: 0.0001  max mem: 6883
Epoch: [8] Total time: 0:02:35 (0.7986 s / it)
Averaged stats: lr: 0.000481  loss: 4.2463 (4.2720)
Test:  [ 0/53]  eta: 0:00:52  loss: 3.3435 (3.3435)  acc1: 28.1250 (28.1250)  acc5: 52.0833 (52.0833)  time: 0.9907  data: 0.6724  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 3.3808 (3.4055)  acc1: 21.3542 (20.8807)  acc5: 52.0833 (50.1894)  time: 0.3826  data: 0.0614  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 3.3867 (3.3952)  acc1: 21.3542 (21.4782)  acc5: 49.4792 (49.6528)  time: 0.3215  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 3.3867 (3.3959)  acc1: 21.3542 (21.2030)  acc5: 49.4792 (49.7480)  time: 0.3215  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.3831 (3.3975)  acc1: 20.3125 (21.0747)  acc5: 49.4792 (49.7205)  time: 0.3221  data: 0.0003  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.3813 (3.3958)  acc1: 21.3542 (21.0886)  acc5: 49.4792 (49.7345)  time: 0.3224  data: 0.0002  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.3593 (3.3898)  acc1: 21.3542 (21.1400)  acc5: 49.4792 (49.7900)  time: 0.3075  data: 0.0002  max mem: 6883
Test: Total time: 0:00:17 (0.3305 s / it)
* Acc@1 21.140 Acc@5 49.790 loss 3.390
Accuracy of the network on the 10000 test images: 21.1%
Max accuracy: 21.14%
Epoch: [9]  [  0/195]  eta: 0:04:24  lr: 0.000475  loss: 4.2049 (4.2049)  time: 1.3555  data: 0.5575  max mem: 6883
Epoch: [9]  [ 10/195]  eta: 0:02:36  lr: 0.000475  loss: 4.3461 (4.3106)  time: 0.8470  data: 0.0508  max mem: 6883
Epoch: [9]  [ 20/195]  eta: 0:02:23  lr: 0.000475  loss: 4.3431 (4.3047)  time: 0.7956  data: 0.0002  max mem: 6883
Epoch: [9]  [ 30/195]  eta: 0:02:14  lr: 0.000475  loss: 4.3181 (4.2777)  time: 0.7969  data: 0.0002  max mem: 6883
Epoch: [9]  [ 40/195]  eta: 0:02:05  lr: 0.000475  loss: 4.3270 (4.2919)  time: 0.7969  data: 0.0003  max mem: 6883
Epoch: [9]  [ 50/195]  eta: 0:01:57  lr: 0.000475  loss: 4.2915 (4.2801)  time: 0.7964  data: 0.0003  max mem: 6883
Epoch: [9]  [ 60/195]  eta: 0:01:48  lr: 0.000475  loss: 4.2620 (4.2784)  time: 0.7961  data: 0.0003  max mem: 6883
Epoch: [9]  [ 70/195]  eta: 0:01:40  lr: 0.000475  loss: 4.2917 (4.2839)  time: 0.7954  data: 0.0003  max mem: 6883
Epoch: [9]  [ 80/195]  eta: 0:01:32  lr: 0.000475  loss: 4.3263 (4.2873)  time: 0.7961  data: 0.0003  max mem: 6883
Epoch: [9]  [ 90/195]  eta: 0:01:24  lr: 0.000475  loss: 4.3046 (4.2783)  time: 0.7963  data: 0.0003  max mem: 6883
Epoch: [9]  [100/195]  eta: 0:01:16  lr: 0.000475  loss: 4.1973 (4.2674)  time: 0.7967  data: 0.0003  max mem: 6883
Epoch: [9]  [110/195]  eta: 0:01:08  lr: 0.000475  loss: 4.2903 (4.2666)  time: 0.7972  data: 0.0003  max mem: 6883
Epoch: [9]  [120/195]  eta: 0:01:00  lr: 0.000475  loss: 4.3014 (4.2644)  time: 0.7964  data: 0.0003  max mem: 6883
Epoch: [9]  [130/195]  eta: 0:00:52  lr: 0.000475  loss: 4.3157 (4.2682)  time: 0.7964  data: 0.0003  max mem: 6883
Epoch: [9]  [140/195]  eta: 0:00:44  lr: 0.000475  loss: 4.3114 (4.2688)  time: 0.7980  data: 0.0003  max mem: 6883
Epoch: [9]  [150/195]  eta: 0:00:36  lr: 0.000475  loss: 4.3114 (4.2716)  time: 0.7966  data: 0.0003  max mem: 6883
Epoch: [9]  [160/195]  eta: 0:00:28  lr: 0.000475  loss: 4.2770 (4.2700)  time: 0.7968  data: 0.0003  max mem: 6883
Epoch: [9]  [170/195]  eta: 0:00:19  lr: 0.000475  loss: 4.2283 (4.2643)  time: 0.7966  data: 0.0003  max mem: 6883
Epoch: [9]  [180/195]  eta: 0:00:11  lr: 0.000475  loss: 4.2007 (4.2635)  time: 0.7964  data: 0.0003  max mem: 6883
Epoch: [9]  [190/195]  eta: 0:00:03  lr: 0.000475  loss: 4.2039 (4.2591)  time: 0.7962  data: 0.0002  max mem: 6883
Epoch: [9]  [194/195]  eta: 0:00:00  lr: 0.000475  loss: 4.2007 (4.2579)  time: 0.7964  data: 0.0002  max mem: 6883
Epoch: [9] Total time: 0:02:35 (0.7998 s / it)
Averaged stats: lr: 0.000475  loss: 4.2007 (4.2499)
Test:  [ 0/53]  eta: 0:01:03  loss: 3.3869 (3.3869)  acc1: 25.5208 (25.5208)  acc5: 54.1667 (54.1667)  time: 1.1923  data: 0.8752  max mem: 6883
Test:  [10/53]  eta: 0:00:17  loss: 3.4036 (3.4051)  acc1: 20.8333 (20.6439)  acc5: 51.5625 (50.6629)  time: 0.4015  data: 0.0798  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 3.3696 (3.3927)  acc1: 20.8333 (21.1062)  acc5: 51.0417 (51.1657)  time: 0.3218  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:08  loss: 3.3696 (3.3909)  acc1: 21.8750 (21.4886)  acc5: 50.5208 (50.9745)  time: 0.3221  data: 0.0006  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.3841 (3.3942)  acc1: 21.3542 (21.3923)  acc5: 50.0000 (50.9400)  time: 0.3221  data: 0.0006  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.3780 (3.3922)  acc1: 19.7917 (21.3950)  acc5: 51.0417 (50.8681)  time: 0.3221  data: 0.0004  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.3633 (3.3879)  acc1: 21.3542 (21.4500)  acc5: 51.5625 (50.9400)  time: 0.3077  data: 0.0003  max mem: 6883
Test: Total time: 0:00:17 (0.3352 s / it)
* Acc@1 21.450 Acc@5 50.940 loss 3.388
Accuracy of the network on the 10000 test images: 21.5%
Max accuracy: 21.45%
Epoch: [10]  [  0/195]  eta: 0:04:14  lr: 0.000469  loss: 4.2890 (4.2890)  time: 1.3046  data: 0.5035  max mem: 6883
Epoch: [10]  [ 10/195]  eta: 0:02:36  lr: 0.000469  loss: 4.2890 (4.2641)  time: 0.8434  data: 0.0460  max mem: 6883
Epoch: [10]  [ 20/195]  eta: 0:02:23  lr: 0.000469  loss: 4.2475 (4.2305)  time: 0.7958  data: 0.0002  max mem: 6883
Epoch: [10]  [ 30/195]  eta: 0:02:14  lr: 0.000469  loss: 4.2379 (4.2268)  time: 0.7960  data: 0.0002  max mem: 6883
Epoch: [10]  [ 40/195]  eta: 0:02:05  lr: 0.000469  loss: 4.2379 (4.2329)  time: 0.7961  data: 0.0003  max mem: 6883
Epoch: [10]  [ 50/195]  eta: 0:01:56  lr: 0.000469  loss: 4.2962 (4.2348)  time: 0.7964  data: 0.0003  max mem: 6883
Epoch: [10]  [ 60/195]  eta: 0:01:48  lr: 0.000469  loss: 4.3076 (4.2487)  time: 0.7973  data: 0.0002  max mem: 6883
Epoch: [10]  [ 70/195]  eta: 0:01:40  lr: 0.000469  loss: 4.2734 (4.2436)  time: 0.7964  data: 0.0002  max mem: 6883
Epoch: [10]  [ 80/195]  eta: 0:01:32  lr: 0.000469  loss: 4.2790 (4.2562)  time: 0.7962  data: 0.0002  max mem: 6883
Epoch: [10]  [ 90/195]  eta: 0:01:24  lr: 0.000469  loss: 4.3369 (4.2613)  time: 0.7967  data: 0.0003  max mem: 6883
Epoch: [10]  [100/195]  eta: 0:01:16  lr: 0.000469  loss: 4.2358 (4.2537)  time: 0.7965  data: 0.0002  max mem: 6883
Epoch: [10]  [110/195]  eta: 0:01:08  lr: 0.000469  loss: 4.1701 (4.2506)  time: 0.7996  data: 0.0002  max mem: 6883
Epoch: [10]  [120/195]  eta: 0:01:00  lr: 0.000469  loss: 4.2748 (4.2490)  time: 0.8006  data: 0.0003  max mem: 6883
Epoch: [10]  [130/195]  eta: 0:00:52  lr: 0.000469  loss: 4.2796 (4.2493)  time: 0.7965  data: 0.0003  max mem: 6883
Epoch: [10]  [140/195]  eta: 0:00:44  lr: 0.000469  loss: 4.3096 (4.2520)  time: 0.7955  data: 0.0002  max mem: 6883
Epoch: [10]  [150/195]  eta: 0:00:36  lr: 0.000469  loss: 4.3096 (4.2513)  time: 0.7955  data: 0.0002  max mem: 6883
Epoch: [10]  [160/195]  eta: 0:00:27  lr: 0.000469  loss: 4.2929 (4.2538)  time: 0.7951  data: 0.0002  max mem: 6883
Epoch: [10]  [170/195]  eta: 0:00:19  lr: 0.000469  loss: 4.2929 (4.2523)  time: 0.7944  data: 0.0002  max mem: 6883
Epoch: [10]  [180/195]  eta: 0:00:11  lr: 0.000469  loss: 4.2667 (4.2548)  time: 0.7952  data: 0.0002  max mem: 6883
Epoch: [10]  [190/195]  eta: 0:00:03  lr: 0.000469  loss: 4.2860 (4.2579)  time: 0.7956  data: 0.0002  max mem: 6883
Epoch: [10]  [194/195]  eta: 0:00:00  lr: 0.000469  loss: 4.2860 (4.2561)  time: 0.7953  data: 0.0001  max mem: 6883
Epoch: [10] Total time: 0:02:35 (0.7995 s / it)
Averaged stats: lr: 0.000469  loss: 4.2860 (4.2553)
Test:  [ 0/53]  eta: 0:00:46  loss: 3.3548 (3.3548)  acc1: 25.5208 (25.5208)  acc5: 54.6875 (54.6875)  time: 0.8814  data: 0.5610  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 3.4068 (3.3981)  acc1: 22.3958 (22.0170)  acc5: 53.1250 (52.1307)  time: 0.3740  data: 0.0512  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 3.3678 (3.3835)  acc1: 22.3958 (22.3710)  acc5: 52.0833 (52.6786)  time: 0.3222  data: 0.0002  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 3.3678 (3.3862)  acc1: 21.8750 (22.3118)  acc5: 52.0833 (52.3522)  time: 0.3224  data: 0.0002  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.3798 (3.3892)  acc1: 21.3542 (22.1672)  acc5: 51.0417 (51.9817)  time: 0.3226  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.3798 (3.3899)  acc1: 21.8750 (22.0180)  acc5: 51.0417 (52.0629)  time: 0.3226  data: 0.0002  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.3798 (3.3842)  acc1: 21.8750 (22.0400)  acc5: 51.5625 (52.0600)  time: 0.3080  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3290 s / it)
* Acc@1 22.040 Acc@5 52.060 loss 3.384
Accuracy of the network on the 10000 test images: 22.0%
Max accuracy: 22.04%
Epoch: [11]  [  0/195]  eta: 0:04:18  lr: 0.000462  loss: 4.4905 (4.4905)  time: 1.3251  data: 0.5321  max mem: 6883
Epoch: [11]  [ 10/195]  eta: 0:02:36  lr: 0.000462  loss: 4.3461 (4.2932)  time: 0.8458  data: 0.0487  max mem: 6883
Epoch: [11]  [ 20/195]  eta: 0:02:23  lr: 0.000462  loss: 4.2560 (4.2639)  time: 0.7956  data: 0.0003  max mem: 6883
Epoch: [11]  [ 30/195]  eta: 0:02:14  lr: 0.000462  loss: 4.2570 (4.2640)  time: 0.7954  data: 0.0002  max mem: 6883
Epoch: [11]  [ 40/195]  eta: 0:02:05  lr: 0.000462  loss: 4.2664 (4.2689)  time: 0.7961  data: 0.0002  max mem: 6883
Epoch: [11]  [ 50/195]  eta: 0:01:57  lr: 0.000462  loss: 4.3006 (4.2708)  time: 0.8028  data: 0.0002  max mem: 6883
Epoch: [11]  [ 60/195]  eta: 0:01:48  lr: 0.000462  loss: 4.2734 (4.2621)  time: 0.8030  data: 0.0002  max mem: 6883
Epoch: [11]  [ 70/195]  eta: 0:01:40  lr: 0.000462  loss: 4.2199 (4.2577)  time: 0.7951  data: 0.0002  max mem: 6883
Epoch: [11]  [ 80/195]  eta: 0:01:32  lr: 0.000462  loss: 4.2500 (4.2475)  time: 0.7944  data: 0.0002  max mem: 6883
Epoch: [11]  [ 90/195]  eta: 0:01:24  lr: 0.000462  loss: 4.2585 (4.2470)  time: 0.7952  data: 0.0002  max mem: 6883
Epoch: [11]  [100/195]  eta: 0:01:16  lr: 0.000462  loss: 4.2585 (4.2456)  time: 0.7949  data: 0.0002  max mem: 6883
Epoch: [11]  [110/195]  eta: 0:01:08  lr: 0.000462  loss: 4.2102 (4.2440)  time: 0.7940  data: 0.0002  max mem: 6883
Epoch: [11]  [120/195]  eta: 0:01:00  lr: 0.000462  loss: 4.2311 (4.2439)  time: 0.7950  data: 0.0003  max mem: 6883
Epoch: [11]  [130/195]  eta: 0:00:52  lr: 0.000462  loss: 4.2663 (4.2466)  time: 0.7955  data: 0.0003  max mem: 6883
Epoch: [11]  [140/195]  eta: 0:00:44  lr: 0.000462  loss: 4.2747 (4.2453)  time: 0.7948  data: 0.0002  max mem: 6883
Epoch: [11]  [150/195]  eta: 0:00:35  lr: 0.000462  loss: 4.2651 (4.2418)  time: 0.7948  data: 0.0003  max mem: 6883
Epoch: [11]  [160/195]  eta: 0:00:27  lr: 0.000462  loss: 4.2651 (4.2441)  time: 0.7950  data: 0.0002  max mem: 6883
Epoch: [11]  [170/195]  eta: 0:00:19  lr: 0.000462  loss: 4.2901 (4.2444)  time: 0.7944  data: 0.0002  max mem: 6883
Epoch: [11]  [180/195]  eta: 0:00:11  lr: 0.000462  loss: 4.2901 (4.2472)  time: 0.7954  data: 0.0002  max mem: 6883
Epoch: [11]  [190/195]  eta: 0:00:03  lr: 0.000462  loss: 4.3076 (4.2480)  time: 0.7953  data: 0.0002  max mem: 6883
Epoch: [11]  [194/195]  eta: 0:00:00  lr: 0.000462  loss: 4.2726 (4.2455)  time: 0.7955  data: 0.0001  max mem: 6883
Epoch: [11] Total time: 0:02:35 (0.7992 s / it)
Averaged stats: lr: 0.000462  loss: 4.2726 (4.2522)
Test:  [ 0/53]  eta: 0:00:53  loss: 3.2303 (3.2303)  acc1: 26.5625 (26.5625)  acc5: 59.3750 (59.3750)  time: 1.0005  data: 0.6781  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 3.2767 (3.3031)  acc1: 23.9583 (23.7689)  acc5: 52.6042 (53.6932)  time: 0.3840  data: 0.0620  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 3.2767 (3.2868)  acc1: 23.4375 (23.9335)  acc5: 52.6042 (53.9683)  time: 0.3217  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 3.2893 (3.2932)  acc1: 23.9583 (23.8911)  acc5: 52.6042 (53.2426)  time: 0.3220  data: 0.0002  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.3100 (3.2996)  acc1: 23.9583 (23.9202)  acc5: 51.5625 (53.0107)  time: 0.3220  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.3100 (3.3012)  acc1: 23.4375 (23.7132)  acc5: 52.6042 (52.8697)  time: 0.3224  data: 0.0001  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.2756 (3.2934)  acc1: 23.4375 (23.7800)  acc5: 53.1250 (52.8500)  time: 0.3079  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3308 s / it)
* Acc@1 23.780 Acc@5 52.850 loss 3.293
Accuracy of the network on the 10000 test images: 23.8%
Max accuracy: 23.78%
Epoch: [12]  [  0/195]  eta: 0:04:25  lr: 0.000454  loss: 4.3512 (4.3512)  time: 1.3614  data: 0.5675  max mem: 6883
Epoch: [12]  [ 10/195]  eta: 0:02:36  lr: 0.000454  loss: 4.3451 (4.3068)  time: 0.8483  data: 0.0518  max mem: 6883
Epoch: [12]  [ 20/195]  eta: 0:02:24  lr: 0.000454  loss: 4.2315 (4.2404)  time: 0.7961  data: 0.0002  max mem: 6883
Epoch: [12]  [ 30/195]  eta: 0:02:14  lr: 0.000454  loss: 4.1921 (4.2284)  time: 0.7957  data: 0.0002  max mem: 6883
Epoch: [12]  [ 40/195]  eta: 0:02:05  lr: 0.000454  loss: 4.2696 (4.2384)  time: 0.7959  data: 0.0003  max mem: 6883
Epoch: [12]  [ 50/195]  eta: 0:01:57  lr: 0.000454  loss: 4.2879 (4.2333)  time: 0.7964  data: 0.0003  max mem: 6883
Epoch: [12]  [ 60/195]  eta: 0:01:48  lr: 0.000454  loss: 4.2549 (4.2303)  time: 0.7960  data: 0.0003  max mem: 6883
Epoch: [12]  [ 70/195]  eta: 0:01:40  lr: 0.000454  loss: 4.2599 (4.2338)  time: 0.7964  data: 0.0003  max mem: 6883
Epoch: [12]  [ 80/195]  eta: 0:01:32  lr: 0.000454  loss: 4.2156 (4.2255)  time: 0.7957  data: 0.0003  max mem: 6883
Epoch: [12]  [ 90/195]  eta: 0:01:24  lr: 0.000454  loss: 4.1891 (4.2275)  time: 0.7946  data: 0.0002  max mem: 6883
Epoch: [12]  [100/195]  eta: 0:01:16  lr: 0.000454  loss: 4.2899 (4.2321)  time: 0.7954  data: 0.0002  max mem: 6883
Epoch: [12]  [110/195]  eta: 0:01:08  lr: 0.000454  loss: 4.3397 (4.2394)  time: 0.7963  data: 0.0003  max mem: 6883
Epoch: [12]  [120/195]  eta: 0:01:00  lr: 0.000454  loss: 4.3178 (4.2424)  time: 0.7973  data: 0.0003  max mem: 6883
Epoch: [12]  [130/195]  eta: 0:00:52  lr: 0.000454  loss: 4.2676 (4.2374)  time: 0.7968  data: 0.0002  max mem: 6883
Epoch: [12]  [140/195]  eta: 0:00:44  lr: 0.000454  loss: 4.2237 (4.2335)  time: 0.7970  data: 0.0002  max mem: 6883
Epoch: [12]  [150/195]  eta: 0:00:35  lr: 0.000454  loss: 4.2760 (4.2391)  time: 0.7972  data: 0.0002  max mem: 6883
Epoch: [12]  [160/195]  eta: 0:00:27  lr: 0.000454  loss: 4.2571 (4.2368)  time: 0.7975  data: 0.0003  max mem: 6883
Epoch: [12]  [170/195]  eta: 0:00:19  lr: 0.000454  loss: 4.1882 (4.2368)  time: 0.7975  data: 0.0003  max mem: 6883
Epoch: [12]  [180/195]  eta: 0:00:11  lr: 0.000454  loss: 4.2624 (4.2371)  time: 0.7959  data: 0.0002  max mem: 6883
Epoch: [12]  [190/195]  eta: 0:00:03  lr: 0.000454  loss: 4.2808 (4.2405)  time: 0.7952  data: 0.0001  max mem: 6883
Epoch: [12]  [194/195]  eta: 0:00:00  lr: 0.000454  loss: 4.3060 (4.2407)  time: 0.7950  data: 0.0001  max mem: 6883
Epoch: [12] Total time: 0:02:35 (0.7998 s / it)
Averaged stats: lr: 0.000454  loss: 4.3060 (4.2360)
Test:  [ 0/53]  eta: 0:01:06  loss: 3.2871 (3.2871)  acc1: 32.2917 (32.2917)  acc5: 54.6875 (54.6875)  time: 1.2525  data: 0.9331  max mem: 6883
Test:  [10/53]  eta: 0:00:17  loss: 3.3065 (3.3240)  acc1: 24.4792 (24.9053)  acc5: 54.1667 (53.1723)  time: 0.4064  data: 0.0852  max mem: 6883
Test:  [20/53]  eta: 0:00:12  loss: 3.3065 (3.3133)  acc1: 24.4792 (24.5784)  acc5: 54.1667 (53.4722)  time: 0.3217  data: 0.0004  max mem: 6883
Test:  [30/53]  eta: 0:00:08  loss: 3.3064 (3.3178)  acc1: 25.0000 (24.6640)  acc5: 54.1667 (53.7298)  time: 0.3216  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.3238 (3.3255)  acc1: 23.9583 (24.4029)  acc5: 53.1250 (53.5315)  time: 0.3211  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.3468 (3.3269)  acc1: 22.3958 (24.1013)  acc5: 52.0833 (53.4824)  time: 0.3224  data: 0.0001  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.3207 (3.3224)  acc1: 23.4375 (24.1300)  acc5: 52.6042 (53.4800)  time: 0.3078  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3353 s / it)
* Acc@1 24.130 Acc@5 53.480 loss 3.322
Accuracy of the network on the 10000 test images: 24.1%
Max accuracy: 24.13%
Epoch: [13]  [  0/195]  eta: 0:04:44  lr: 0.000446  loss: 4.1252 (4.1252)  time: 1.4587  data: 0.6638  max mem: 6883
Epoch: [13]  [ 10/195]  eta: 0:02:38  lr: 0.000446  loss: 4.2342 (4.2291)  time: 0.8573  data: 0.0609  max mem: 6883
Epoch: [13]  [ 20/195]  eta: 0:02:24  lr: 0.000446  loss: 4.3289 (4.2912)  time: 0.7954  data: 0.0004  max mem: 6883
Epoch: [13]  [ 30/195]  eta: 0:02:14  lr: 0.000446  loss: 4.3289 (4.2839)  time: 0.7957  data: 0.0002  max mem: 6883
Epoch: [13]  [ 40/195]  eta: 0:02:05  lr: 0.000446  loss: 4.3090 (4.2964)  time: 0.7953  data: 0.0002  max mem: 6883
Epoch: [13]  [ 50/195]  eta: 0:01:57  lr: 0.000446  loss: 4.3236 (4.2816)  time: 0.7942  data: 0.0002  max mem: 6883
Epoch: [13]  [ 60/195]  eta: 0:01:48  lr: 0.000446  loss: 4.2497 (4.2672)  time: 0.7951  data: 0.0002  max mem: 6883
Epoch: [13]  [ 70/195]  eta: 0:01:40  lr: 0.000446  loss: 4.2222 (4.2679)  time: 0.7953  data: 0.0003  max mem: 6883
Epoch: [13]  [ 80/195]  eta: 0:01:32  lr: 0.000446  loss: 4.2222 (4.2621)  time: 0.7953  data: 0.0002  max mem: 6883
Epoch: [13]  [ 90/195]  eta: 0:01:24  lr: 0.000446  loss: 4.2514 (4.2642)  time: 0.7951  data: 0.0002  max mem: 6883
Epoch: [13]  [100/195]  eta: 0:01:16  lr: 0.000446  loss: 4.2818 (4.2663)  time: 0.7953  data: 0.0003  max mem: 6883
Epoch: [13]  [110/195]  eta: 0:01:08  lr: 0.000446  loss: 4.3043 (4.2689)  time: 0.7950  data: 0.0003  max mem: 6883
Epoch: [13]  [120/195]  eta: 0:01:00  lr: 0.000446  loss: 4.3043 (4.2672)  time: 0.8002  data: 0.0002  max mem: 6883
Epoch: [13]  [130/195]  eta: 0:00:52  lr: 0.000446  loss: 4.2571 (4.2683)  time: 0.8009  data: 0.0003  max mem: 6883
Epoch: [13]  [140/195]  eta: 0:00:44  lr: 0.000446  loss: 4.3021 (4.2663)  time: 0.7957  data: 0.0002  max mem: 6883
Epoch: [13]  [150/195]  eta: 0:00:36  lr: 0.000446  loss: 4.3042 (4.2649)  time: 0.7957  data: 0.0002  max mem: 6883
Epoch: [13]  [160/195]  eta: 0:00:28  lr: 0.000446  loss: 4.2842 (4.2641)  time: 0.7953  data: 0.0002  max mem: 6883
Epoch: [13]  [170/195]  eta: 0:00:19  lr: 0.000446  loss: 4.3037 (4.2681)  time: 0.7946  data: 0.0002  max mem: 6883
Epoch: [13]  [180/195]  eta: 0:00:11  lr: 0.000446  loss: 4.3037 (4.2678)  time: 0.7954  data: 0.0002  max mem: 6883
Epoch: [13]  [190/195]  eta: 0:00:03  lr: 0.000446  loss: 4.2631 (4.2665)  time: 0.7951  data: 0.0001  max mem: 6883
Epoch: [13]  [194/195]  eta: 0:00:00  lr: 0.000446  loss: 4.2842 (4.2665)  time: 0.7950  data: 0.0001  max mem: 6883
Epoch: [13] Total time: 0:02:35 (0.7998 s / it)
Averaged stats: lr: 0.000446  loss: 4.2842 (4.2535)
Test:  [ 0/53]  eta: 0:00:51  loss: 3.3116 (3.3116)  acc1: 25.5208 (25.5208)  acc5: 56.2500 (56.2500)  time: 0.9732  data: 0.6558  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 3.3486 (3.3658)  acc1: 23.4375 (22.8220)  acc5: 53.6458 (53.3617)  time: 0.3811  data: 0.0599  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 3.3401 (3.3464)  acc1: 22.3958 (22.7927)  acc5: 52.6042 (53.7946)  time: 0.3215  data: 0.0004  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 3.3534 (3.3525)  acc1: 22.3958 (22.8831)  acc5: 52.0833 (53.2930)  time: 0.3223  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.3735 (3.3614)  acc1: 22.9167 (22.9294)  acc5: 52.0833 (52.9345)  time: 0.3229  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.3763 (3.3619)  acc1: 22.9167 (22.8554)  acc5: 51.0417 (52.6348)  time: 0.3229  data: 0.0001  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.3547 (3.3578)  acc1: 22.9167 (22.8800)  acc5: 51.0417 (52.6500)  time: 0.3083  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3308 s / it)
* Acc@1 22.880 Acc@5 52.650 loss 3.358
Accuracy of the network on the 10000 test images: 22.9%
Max accuracy: 24.13%
Epoch: [14]  [  0/195]  eta: 0:04:37  lr: 0.000437  loss: 4.2907 (4.2907)  time: 1.4233  data: 0.6272  max mem: 6883
Epoch: [14]  [ 10/195]  eta: 0:02:37  lr: 0.000437  loss: 4.1487 (4.1831)  time: 0.8522  data: 0.0571  max mem: 6883
Epoch: [14]  [ 20/195]  eta: 0:02:24  lr: 0.000437  loss: 4.1949 (4.1942)  time: 0.7940  data: 0.0002  max mem: 6883
Epoch: [14]  [ 30/195]  eta: 0:02:14  lr: 0.000437  loss: 4.2177 (4.1883)  time: 0.7950  data: 0.0002  max mem: 6883
Epoch: [14]  [ 40/195]  eta: 0:02:05  lr: 0.000437  loss: 4.2438 (4.2185)  time: 0.7954  data: 0.0002  max mem: 6883
Epoch: [14]  [ 50/195]  eta: 0:01:57  lr: 0.000437  loss: 4.2760 (4.2161)  time: 0.7952  data: 0.0003  max mem: 6883
Epoch: [14]  [ 60/195]  eta: 0:01:48  lr: 0.000437  loss: 4.2121 (4.2172)  time: 0.7954  data: 0.0002  max mem: 6883
Epoch: [14]  [ 70/195]  eta: 0:01:40  lr: 0.000437  loss: 4.1619 (4.2184)  time: 0.7947  data: 0.0002  max mem: 6883
Epoch: [14]  [ 80/195]  eta: 0:01:32  lr: 0.000437  loss: 4.2484 (4.2230)  time: 0.7947  data: 0.0002  max mem: 6883
Epoch: [14]  [ 90/195]  eta: 0:01:24  lr: 0.000437  loss: 4.2785 (4.2276)  time: 0.7953  data: 0.0002  max mem: 6883
Epoch: [14]  [100/195]  eta: 0:01:16  lr: 0.000437  loss: 4.2456 (4.2258)  time: 0.7959  data: 0.0002  max mem: 6883
Epoch: [14]  [110/195]  eta: 0:01:08  lr: 0.000437  loss: 4.1761 (4.2226)  time: 0.7963  data: 0.0002  max mem: 6883
Epoch: [14]  [120/195]  eta: 0:01:00  lr: 0.000437  loss: 4.1407 (4.2195)  time: 0.7966  data: 0.0002  max mem: 6883
Epoch: [14]  [130/195]  eta: 0:00:52  lr: 0.000437  loss: 4.0943 (4.2085)  time: 0.7955  data: 0.0002  max mem: 6883
Epoch: [14]  [140/195]  eta: 0:00:43  lr: 0.000437  loss: 4.1116 (4.2085)  time: 0.7959  data: 0.0002  max mem: 6883
Epoch: [14]  [150/195]  eta: 0:00:35  lr: 0.000437  loss: 4.1848 (4.2065)  time: 0.7963  data: 0.0002  max mem: 6883
Epoch: [14]  [160/195]  eta: 0:00:27  lr: 0.000437  loss: 4.1299 (4.2028)  time: 0.7960  data: 0.0002  max mem: 6883
Epoch: [14]  [170/195]  eta: 0:00:19  lr: 0.000437  loss: 4.1754 (4.2069)  time: 0.7977  data: 0.0002  max mem: 6883
Epoch: [14]  [180/195]  eta: 0:00:11  lr: 0.000437  loss: 4.2064 (4.2051)  time: 0.7999  data: 0.0003  max mem: 6883
Epoch: [14]  [190/195]  eta: 0:00:03  lr: 0.000437  loss: 4.1660 (4.2046)  time: 0.7995  data: 0.0002  max mem: 6883
Epoch: [14]  [194/195]  eta: 0:00:00  lr: 0.000437  loss: 4.1660 (4.2042)  time: 0.7984  data: 0.0002  max mem: 6883
Epoch: [14] Total time: 0:02:36 (0.8000 s / it)
Averaged stats: lr: 0.000437  loss: 4.1660 (4.2162)
Test:  [ 0/53]  eta: 0:00:59  loss: 3.1858 (3.1858)  acc1: 31.7708 (31.7708)  acc5: 58.8542 (58.8542)  time: 1.1149  data: 0.7962  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 3.2968 (3.2882)  acc1: 22.9167 (23.1534)  acc5: 54.1667 (55.4924)  time: 0.3939  data: 0.0727  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 3.2756 (3.2711)  acc1: 23.4375 (24.3552)  acc5: 54.1667 (55.4315)  time: 0.3216  data: 0.0004  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 3.2609 (3.2762)  acc1: 26.0417 (24.6808)  acc5: 54.1667 (54.9059)  time: 0.3221  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.2915 (3.2800)  acc1: 23.9583 (24.4411)  acc5: 53.6458 (54.6748)  time: 0.3222  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.2675 (3.2802)  acc1: 23.4375 (24.3770)  acc5: 54.1667 (54.6875)  time: 0.3226  data: 0.0002  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.2621 (3.2726)  acc1: 23.9583 (24.5400)  acc5: 54.1667 (54.6900)  time: 0.3083  data: 0.0002  max mem: 6883
Test: Total time: 0:00:17 (0.3331 s / it)
* Acc@1 24.540 Acc@5 54.690 loss 3.273
Accuracy of the network on the 10000 test images: 24.5%
Max accuracy: 24.54%
Epoch: [15]  [  0/195]  eta: 0:04:36  lr: 0.000427  loss: 4.2397 (4.2397)  time: 1.4156  data: 0.6191  max mem: 6883
Epoch: [15]  [ 10/195]  eta: 0:02:38  lr: 0.000427  loss: 4.2636 (4.2880)  time: 0.8545  data: 0.0565  max mem: 6883
Epoch: [15]  [ 20/195]  eta: 0:02:24  lr: 0.000427  loss: 4.2636 (4.2629)  time: 0.7980  data: 0.0003  max mem: 6883
Epoch: [15]  [ 30/195]  eta: 0:02:14  lr: 0.000427  loss: 4.2550 (4.2286)  time: 0.7956  data: 0.0003  max mem: 6883
Epoch: [15]  [ 40/195]  eta: 0:02:05  lr: 0.000427  loss: 4.1234 (4.2197)  time: 0.7942  data: 0.0003  max mem: 6883
Epoch: [15]  [ 50/195]  eta: 0:01:57  lr: 0.000427  loss: 4.2601 (4.2301)  time: 0.7949  data: 0.0003  max mem: 6883
Epoch: [15]  [ 60/195]  eta: 0:01:48  lr: 0.000427  loss: 4.2506 (4.2165)  time: 0.7942  data: 0.0002  max mem: 6883
Epoch: [15]  [ 70/195]  eta: 0:01:40  lr: 0.000427  loss: 4.2016 (4.2228)  time: 0.7958  data: 0.0003  max mem: 6883
Epoch: [15]  [ 80/195]  eta: 0:01:32  lr: 0.000427  loss: 4.2561 (4.2208)  time: 0.7975  data: 0.0003  max mem: 6883
Epoch: [15]  [ 90/195]  eta: 0:01:24  lr: 0.000427  loss: 4.2561 (4.2248)  time: 0.7964  data: 0.0003  max mem: 6883
Epoch: [15]  [100/195]  eta: 0:01:16  lr: 0.000427  loss: 4.1944 (4.2184)  time: 0.7955  data: 0.0003  max mem: 6883
Epoch: [15]  [110/195]  eta: 0:01:08  lr: 0.000427  loss: 4.1790 (4.2189)  time: 0.7960  data: 0.0003  max mem: 6883
Epoch: [15]  [120/195]  eta: 0:01:00  lr: 0.000427  loss: 4.2446 (4.2191)  time: 0.7956  data: 0.0002  max mem: 6883
Epoch: [15]  [130/195]  eta: 0:00:52  lr: 0.000427  loss: 4.2474 (4.2204)  time: 0.7948  data: 0.0002  max mem: 6883
Epoch: [15]  [140/195]  eta: 0:00:44  lr: 0.000427  loss: 4.2173 (4.2184)  time: 0.7962  data: 0.0002  max mem: 6883
Epoch: [15]  [150/195]  eta: 0:00:35  lr: 0.000427  loss: 4.2050 (4.2222)  time: 0.7955  data: 0.0002  max mem: 6883
Epoch: [15]  [160/195]  eta: 0:00:27  lr: 0.000427  loss: 4.2097 (4.2200)  time: 0.7952  data: 0.0002  max mem: 6883
Epoch: [15]  [170/195]  eta: 0:00:19  lr: 0.000427  loss: 4.2021 (4.2235)  time: 0.7953  data: 0.0002  max mem: 6883
Epoch: [15]  [180/195]  eta: 0:00:11  lr: 0.000427  loss: 4.2207 (4.2188)  time: 0.7955  data: 0.0002  max mem: 6883
Epoch: [15]  [190/195]  eta: 0:00:03  lr: 0.000427  loss: 4.2207 (4.2218)  time: 0.7955  data: 0.0002  max mem: 6883
Epoch: [15]  [194/195]  eta: 0:00:00  lr: 0.000427  loss: 4.2345 (4.2237)  time: 0.7954  data: 0.0002  max mem: 6883
Epoch: [15] Total time: 0:02:35 (0.7995 s / it)
Averaged stats: lr: 0.000427  loss: 4.2345 (4.2177)
Test:  [ 0/53]  eta: 0:00:54  loss: 3.1819 (3.1819)  acc1: 33.3333 (33.3333)  acc5: 59.3750 (59.3750)  time: 1.0339  data: 0.7163  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 3.2678 (3.2701)  acc1: 25.0000 (25.4735)  acc5: 55.2083 (55.7765)  time: 0.3880  data: 0.0655  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 3.2662 (3.2566)  acc1: 25.0000 (25.2480)  acc5: 55.2083 (55.7044)  time: 0.3226  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 3.2599 (3.2598)  acc1: 26.0417 (25.6552)  acc5: 54.6875 (55.2251)  time: 0.3223  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.2679 (3.2631)  acc1: 25.5208 (25.6860)  acc5: 53.6458 (54.9416)  time: 0.3224  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.2573 (3.2655)  acc1: 25.0000 (25.5004)  acc5: 54.1667 (54.9224)  time: 0.3234  data: 0.0001  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.2513 (3.2622)  acc1: 25.5208 (25.5300)  acc5: 55.2083 (54.9500)  time: 0.3089  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3321 s / it)
* Acc@1 25.530 Acc@5 54.950 loss 3.262
Accuracy of the network on the 10000 test images: 25.5%
Max accuracy: 25.53%
Epoch: [16]  [  0/195]  eta: 0:04:27  lr: 0.000418  loss: 4.2320 (4.2320)  time: 1.3710  data: 0.5180  max mem: 6883
Epoch: [16]  [ 10/195]  eta: 0:02:38  lr: 0.000418  loss: 4.2345 (4.2293)  time: 0.8568  data: 0.0472  max mem: 6883
Epoch: [16]  [ 20/195]  eta: 0:02:24  lr: 0.000418  loss: 4.2078 (4.1954)  time: 0.7994  data: 0.0002  max mem: 6883
Epoch: [16]  [ 30/195]  eta: 0:02:14  lr: 0.000418  loss: 4.1395 (4.1858)  time: 0.7952  data: 0.0002  max mem: 6883
Epoch: [16]  [ 40/195]  eta: 0:02:05  lr: 0.000418  loss: 4.1885 (4.1870)  time: 0.7951  data: 0.0002  max mem: 6883
Epoch: [16]  [ 50/195]  eta: 0:01:57  lr: 0.000418  loss: 4.2067 (4.1912)  time: 0.7987  data: 0.0002  max mem: 6883
Epoch: [16]  [ 60/195]  eta: 0:01:48  lr: 0.000418  loss: 4.2089 (4.1993)  time: 0.7991  data: 0.0002  max mem: 6883
Epoch: [16]  [ 70/195]  eta: 0:01:40  lr: 0.000418  loss: 4.2089 (4.1986)  time: 0.7949  data: 0.0002  max mem: 6883
Epoch: [16]  [ 80/195]  eta: 0:01:32  lr: 0.000418  loss: 4.2735 (4.2078)  time: 0.7954  data: 0.0002  max mem: 6883
Epoch: [16]  [ 90/195]  eta: 0:01:24  lr: 0.000418  loss: 4.2664 (4.2102)  time: 0.7963  data: 0.0002  max mem: 6883
Epoch: [16]  [100/195]  eta: 0:01:16  lr: 0.000418  loss: 4.2438 (4.2144)  time: 0.7964  data: 0.0002  max mem: 6883
Epoch: [16]  [110/195]  eta: 0:01:08  lr: 0.000418  loss: 4.2589 (4.2168)  time: 0.7964  data: 0.0002  max mem: 6883
Epoch: [16]  [120/195]  eta: 0:01:00  lr: 0.000418  loss: 4.3088 (4.2269)  time: 0.7963  data: 0.0002  max mem: 6883
Epoch: [16]  [130/195]  eta: 0:00:52  lr: 0.000418  loss: 4.3088 (4.2261)  time: 0.7957  data: 0.0002  max mem: 6883
Epoch: [16]  [140/195]  eta: 0:00:44  lr: 0.000418  loss: 4.2487 (4.2246)  time: 0.7969  data: 0.0002  max mem: 6883
Epoch: [16]  [150/195]  eta: 0:00:36  lr: 0.000418  loss: 4.1821 (4.2197)  time: 0.7972  data: 0.0002  max mem: 6883
Epoch: [16]  [160/195]  eta: 0:00:28  lr: 0.000418  loss: 4.1821 (4.2198)  time: 0.7962  data: 0.0002  max mem: 6883
Epoch: [16]  [170/195]  eta: 0:00:20  lr: 0.000418  loss: 4.2671 (4.2192)  time: 0.7961  data: 0.0002  max mem: 6883
Epoch: [16]  [180/195]  eta: 0:00:11  lr: 0.000418  loss: 4.2671 (4.2195)  time: 0.7963  data: 0.0002  max mem: 6883
Epoch: [16]  [190/195]  eta: 0:00:03  lr: 0.000418  loss: 4.2483 (4.2192)  time: 0.7957  data: 0.0001  max mem: 6883
Epoch: [16]  [194/195]  eta: 0:00:00  lr: 0.000418  loss: 4.2483 (4.2204)  time: 0.7955  data: 0.0001  max mem: 6883
Epoch: [16] Total time: 0:02:36 (0.8002 s / it)
Averaged stats: lr: 0.000418  loss: 4.2483 (4.2248)
Test:  [ 0/53]  eta: 0:00:45  loss: 3.2222 (3.2222)  acc1: 28.6458 (28.6458)  acc5: 54.6875 (54.6875)  time: 0.8598  data: 0.5404  max mem: 6883
Test:  [10/53]  eta: 0:00:15  loss: 3.2665 (3.2625)  acc1: 23.4375 (24.9053)  acc5: 54.6875 (54.4981)  time: 0.3719  data: 0.0494  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 3.2281 (3.2451)  acc1: 23.9583 (25.1984)  acc5: 55.7292 (55.5556)  time: 0.3223  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 3.2281 (3.2468)  acc1: 25.5208 (25.4704)  acc5: 55.7292 (55.4772)  time: 0.3215  data: 0.0002  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.2518 (3.2517)  acc1: 25.5208 (25.3303)  acc5: 54.6875 (55.3608)  time: 0.3214  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:00  loss: 3.2375 (3.2496)  acc1: 25.0000 (25.1021)  acc5: 55.2083 (55.4841)  time: 0.3230  data: 0.0001  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.2364 (3.2441)  acc1: 25.0000 (25.1400)  acc5: 55.7292 (55.5100)  time: 0.3086  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3286 s / it)
* Acc@1 25.140 Acc@5 55.510 loss 3.244
Accuracy of the network on the 10000 test images: 25.1%
Max accuracy: 25.53%
Epoch: [17]  [  0/195]  eta: 0:04:39  lr: 0.000407  loss: 4.0596 (4.0596)  time: 1.4345  data: 0.6364  max mem: 6883
Epoch: [17]  [ 10/195]  eta: 0:02:38  lr: 0.000407  loss: 4.2572 (4.1991)  time: 0.8543  data: 0.0580  max mem: 6883
Epoch: [17]  [ 20/195]  eta: 0:02:24  lr: 0.000407  loss: 4.1755 (4.1776)  time: 0.7956  data: 0.0002  max mem: 6883
Epoch: [17]  [ 30/195]  eta: 0:02:14  lr: 0.000407  loss: 4.1639 (4.1825)  time: 0.7969  data: 0.0002  max mem: 6883
Epoch: [17]  [ 40/195]  eta: 0:02:05  lr: 0.000407  loss: 4.2652 (4.1915)  time: 0.7971  data: 0.0003  max mem: 6883
Epoch: [17]  [ 50/195]  eta: 0:01:57  lr: 0.000407  loss: 4.2712 (4.1958)  time: 0.7974  data: 0.0003  max mem: 6883
Epoch: [17]  [ 60/195]  eta: 0:01:48  lr: 0.000407  loss: 4.2435 (4.2036)  time: 0.7976  data: 0.0003  max mem: 6883
Epoch: [17]  [ 70/195]  eta: 0:01:40  lr: 0.000407  loss: 4.2985 (4.2096)  time: 0.7961  data: 0.0003  max mem: 6883
Epoch: [17]  [ 80/195]  eta: 0:01:32  lr: 0.000407  loss: 4.3004 (4.2172)  time: 0.7957  data: 0.0002  max mem: 6883
Epoch: [17]  [ 90/195]  eta: 0:01:24  lr: 0.000407  loss: 4.2867 (4.2179)  time: 0.7965  data: 0.0002  max mem: 6883
Epoch: [17]  [100/195]  eta: 0:01:16  lr: 0.000407  loss: 4.2867 (4.2169)  time: 0.7965  data: 0.0002  max mem: 6883
Epoch: [17]  [110/195]  eta: 0:01:08  lr: 0.000407  loss: 4.2281 (4.2181)  time: 0.7960  data: 0.0002  max mem: 6883
Epoch: [17]  [120/195]  eta: 0:01:00  lr: 0.000407  loss: 4.1289 (4.2073)  time: 0.7968  data: 0.0002  max mem: 6883
Epoch: [17]  [130/195]  eta: 0:00:52  lr: 0.000407  loss: 4.1023 (4.2045)  time: 0.7962  data: 0.0003  max mem: 6883
Epoch: [17]  [140/195]  eta: 0:00:44  lr: 0.000407  loss: 4.2157 (4.2045)  time: 0.7957  data: 0.0002  max mem: 6883
Epoch: [17]  [150/195]  eta: 0:00:36  lr: 0.000407  loss: 4.1842 (4.2024)  time: 0.7959  data: 0.0002  max mem: 6883
Epoch: [17]  [160/195]  eta: 0:00:28  lr: 0.000407  loss: 4.2277 (4.2053)  time: 0.7965  data: 0.0002  max mem: 6883
Epoch: [17]  [170/195]  eta: 0:00:20  lr: 0.000407  loss: 4.2362 (4.2030)  time: 0.7957  data: 0.0002  max mem: 6883
Epoch: [17]  [180/195]  eta: 0:00:11  lr: 0.000407  loss: 4.1942 (4.2029)  time: 0.7947  data: 0.0002  max mem: 6883
Epoch: [17]  [190/195]  eta: 0:00:03  lr: 0.000407  loss: 4.1536 (4.2035)  time: 0.7939  data: 0.0001  max mem: 6883
Epoch: [17]  [194/195]  eta: 0:00:00  lr: 0.000407  loss: 4.2192 (4.2043)  time: 0.7941  data: 0.0001  max mem: 6883
Epoch: [17] Total time: 0:02:35 (0.7999 s / it)
Averaged stats: lr: 0.000407  loss: 4.2192 (4.2077)
Test:  [ 0/53]  eta: 0:00:49  loss: 3.2147 (3.2147)  acc1: 30.2083 (30.2083)  acc5: 61.9792 (61.9792)  time: 0.9321  data: 0.6136  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 3.2609 (3.2576)  acc1: 25.5208 (25.4261)  acc5: 56.2500 (57.0549)  time: 0.3783  data: 0.0561  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 3.2609 (3.2460)  acc1: 26.5625 (26.3145)  acc5: 56.7708 (56.8452)  time: 0.3223  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 3.2569 (3.2440)  acc1: 27.6042 (26.8481)  acc5: 55.7292 (56.3676)  time: 0.3218  data: 0.0004  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.2569 (3.2471)  acc1: 26.0417 (26.5371)  acc5: 55.7292 (56.2500)  time: 0.3221  data: 0.0004  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.2416 (3.2467)  acc1: 25.0000 (26.3072)  acc5: 55.7292 (56.1683)  time: 0.3233  data: 0.0002  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.2234 (3.2409)  acc1: 25.5208 (26.3400)  acc5: 55.7292 (56.2000)  time: 0.3087  data: 0.0002  max mem: 6883
Test: Total time: 0:00:17 (0.3299 s / it)
* Acc@1 26.340 Acc@5 56.200 loss 3.241
Accuracy of the network on the 10000 test images: 26.3%
Max accuracy: 26.34%
Epoch: [18]  [  0/195]  eta: 0:04:39  lr: 0.000396  loss: 3.9963 (3.9963)  time: 1.4340  data: 0.6364  max mem: 6883
Epoch: [18]  [ 10/195]  eta: 0:02:38  lr: 0.000396  loss: 4.0937 (4.1287)  time: 0.8555  data: 0.0581  max mem: 6883
Epoch: [18]  [ 20/195]  eta: 0:02:24  lr: 0.000396  loss: 4.2041 (4.1690)  time: 0.7958  data: 0.0002  max mem: 6883
Epoch: [18]  [ 30/195]  eta: 0:02:14  lr: 0.000396  loss: 4.2118 (4.1716)  time: 0.7951  data: 0.0002  max mem: 6883
Epoch: [18]  [ 40/195]  eta: 0:02:05  lr: 0.000396  loss: 4.1833 (4.1784)  time: 0.7962  data: 0.0002  max mem: 6883
Epoch: [18]  [ 50/195]  eta: 0:01:57  lr: 0.000396  loss: 4.2625 (4.1972)  time: 0.7972  data: 0.0003  max mem: 6883
Epoch: [18]  [ 60/195]  eta: 0:01:48  lr: 0.000396  loss: 4.2795 (4.2012)  time: 0.7961  data: 0.0003  max mem: 6883
Epoch: [18]  [ 70/195]  eta: 0:01:40  lr: 0.000396  loss: 4.2514 (4.2129)  time: 0.7963  data: 0.0003  max mem: 6883
Epoch: [18]  [ 80/195]  eta: 0:01:32  lr: 0.000396  loss: 4.2514 (4.2156)  time: 0.7964  data: 0.0002  max mem: 6883
Epoch: [18]  [ 90/195]  eta: 0:01:24  lr: 0.000396  loss: 4.2149 (4.2075)  time: 0.7963  data: 0.0003  max mem: 6883
Epoch: [18]  [100/195]  eta: 0:01:16  lr: 0.000396  loss: 4.1699 (4.2044)  time: 0.7970  data: 0.0002  max mem: 6883
Epoch: [18]  [110/195]  eta: 0:01:08  lr: 0.000396  loss: 4.2311 (4.2148)  time: 0.7962  data: 0.0002  max mem: 6883
Epoch: [18]  [120/195]  eta: 0:01:00  lr: 0.000396  loss: 4.2468 (4.2119)  time: 0.7964  data: 0.0002  max mem: 6883
Epoch: [18]  [130/195]  eta: 0:00:52  lr: 0.000396  loss: 4.2001 (4.2125)  time: 0.7966  data: 0.0002  max mem: 6883
Epoch: [18]  [140/195]  eta: 0:00:44  lr: 0.000396  loss: 4.2314 (4.2173)  time: 0.7960  data: 0.0002  max mem: 6883
Epoch: [18]  [150/195]  eta: 0:00:36  lr: 0.000396  loss: 4.2760 (4.2178)  time: 0.7972  data: 0.0002  max mem: 6883
Epoch: [18]  [160/195]  eta: 0:00:28  lr: 0.000396  loss: 4.2760 (4.2203)  time: 0.7997  data: 0.0003  max mem: 6883
Epoch: [18]  [170/195]  eta: 0:00:20  lr: 0.000396  loss: 4.2620 (4.2214)  time: 0.7990  data: 0.0003  max mem: 6883
Epoch: [18]  [180/195]  eta: 0:00:12  lr: 0.000396  loss: 4.2545 (4.2209)  time: 0.7984  data: 0.0003  max mem: 6883
Epoch: [18]  [190/195]  eta: 0:00:04  lr: 0.000396  loss: 4.2545 (4.2203)  time: 0.7969  data: 0.0002  max mem: 6883
Epoch: [18]  [194/195]  eta: 0:00:00  lr: 0.000396  loss: 4.2238 (4.2194)  time: 0.7958  data: 0.0002  max mem: 6883
Epoch: [18] Total time: 0:02:36 (0.8006 s / it)
Averaged stats: lr: 0.000396  loss: 4.2238 (4.2130)
Test:  [ 0/53]  eta: 0:00:58  loss: 3.1845 (3.1845)  acc1: 31.2500 (31.2500)  acc5: 58.3333 (58.3333)  time: 1.1053  data: 0.7871  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 3.1860 (3.1921)  acc1: 25.5208 (25.7102)  acc5: 54.1667 (55.3504)  time: 0.3933  data: 0.0719  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 3.1792 (3.1693)  acc1: 25.5208 (26.0417)  acc5: 56.2500 (56.3988)  time: 0.3217  data: 0.0004  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 3.1588 (3.1706)  acc1: 26.5625 (26.5625)  acc5: 57.8125 (56.4180)  time: 0.3222  data: 0.0004  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.1714 (3.1746)  acc1: 26.0417 (26.3720)  acc5: 56.2500 (56.4787)  time: 0.3224  data: 0.0003  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.1891 (3.1770)  acc1: 26.0417 (26.1540)  acc5: 56.2500 (56.3726)  time: 0.3227  data: 0.0002  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.1700 (3.1703)  acc1: 26.0417 (26.1700)  acc5: 56.2500 (56.3400)  time: 0.3080  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3331 s / it)
* Acc@1 26.170 Acc@5 56.340 loss 3.170
Accuracy of the network on the 10000 test images: 26.2%
Max accuracy: 26.34%
Epoch: [19]  [  0/195]  eta: 0:05:01  lr: 0.000385  loss: 4.0718 (4.0718)  time: 1.5482  data: 0.5394  max mem: 6883
Epoch: [19]  [ 10/195]  eta: 0:02:40  lr: 0.000385  loss: 4.2096 (4.1924)  time: 0.8665  data: 0.0492  max mem: 6883
Epoch: [19]  [ 20/195]  eta: 0:02:25  lr: 0.000385  loss: 4.2680 (4.1942)  time: 0.7965  data: 0.0002  max mem: 6883
Epoch: [19]  [ 30/195]  eta: 0:02:15  lr: 0.000385  loss: 4.2999 (4.2123)  time: 0.7965  data: 0.0002  max mem: 6883
Epoch: [19]  [ 40/195]  eta: 0:02:06  lr: 0.000385  loss: 4.2888 (4.2255)  time: 0.8000  data: 0.0002  max mem: 6883
Epoch: [19]  [ 50/195]  eta: 0:01:57  lr: 0.000385  loss: 4.2144 (4.2243)  time: 0.7994  data: 0.0002  max mem: 6883
Epoch: [19]  [ 60/195]  eta: 0:01:49  lr: 0.000385  loss: 4.2144 (4.2280)  time: 0.7965  data: 0.0003  max mem: 6883
Epoch: [19]  [ 70/195]  eta: 0:01:41  lr: 0.000385  loss: 4.2824 (4.2252)  time: 0.7979  data: 0.0003  max mem: 6883
Epoch: [19]  [ 80/195]  eta: 0:01:32  lr: 0.000385  loss: 4.2655 (4.2289)  time: 0.7981  data: 0.0003  max mem: 6883
Epoch: [19]  [ 90/195]  eta: 0:01:24  lr: 0.000385  loss: 4.2501 (4.2244)  time: 0.7980  data: 0.0003  max mem: 6883
Epoch: [19]  [100/195]  eta: 0:01:16  lr: 0.000385  loss: 4.1917 (4.2241)  time: 0.7989  data: 0.0003  max mem: 6883
Epoch: [19]  [110/195]  eta: 0:01:08  lr: 0.000385  loss: 4.1525 (4.2108)  time: 0.7983  data: 0.0003  max mem: 6883
Epoch: [19]  [120/195]  eta: 0:01:00  lr: 0.000385  loss: 4.2029 (4.2200)  time: 0.7984  data: 0.0003  max mem: 6883
Epoch: [19]  [130/195]  eta: 0:00:52  lr: 0.000385  loss: 4.2874 (4.2172)  time: 0.7978  data: 0.0003  max mem: 6883
Epoch: [19]  [140/195]  eta: 0:00:44  lr: 0.000385  loss: 4.1730 (4.2120)  time: 0.7984  data: 0.0003  max mem: 6883
Epoch: [19]  [150/195]  eta: 0:00:36  lr: 0.000385  loss: 4.2644 (4.2161)  time: 0.7989  data: 0.0003  max mem: 6883
Epoch: [19]  [160/195]  eta: 0:00:28  lr: 0.000385  loss: 4.2509 (4.2112)  time: 0.7983  data: 0.0003  max mem: 6883
Epoch: [19]  [170/195]  eta: 0:00:20  lr: 0.000385  loss: 4.1998 (4.2113)  time: 0.7973  data: 0.0003  max mem: 6883
Epoch: [19]  [180/195]  eta: 0:00:12  lr: 0.000385  loss: 4.1998 (4.2053)  time: 0.7972  data: 0.0002  max mem: 6883
Epoch: [19]  [190/195]  eta: 0:00:04  lr: 0.000385  loss: 4.1419 (4.2038)  time: 0.7962  data: 0.0001  max mem: 6883
Epoch: [19]  [194/195]  eta: 0:00:00  lr: 0.000385  loss: 4.1424 (4.1999)  time: 0.7959  data: 0.0001  max mem: 6883
Epoch: [19] Total time: 0:02:36 (0.8022 s / it)
Averaged stats: lr: 0.000385  loss: 4.1424 (4.2048)
Test:  [ 0/53]  eta: 0:00:58  loss: 3.1725 (3.1725)  acc1: 30.2083 (30.2083)  acc5: 58.3333 (58.3333)  time: 1.0996  data: 0.7822  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 3.1793 (3.1933)  acc1: 26.0417 (26.4205)  acc5: 57.8125 (56.7235)  time: 0.3948  data: 0.0715  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 3.1763 (3.1796)  acc1: 26.0417 (26.5129)  acc5: 56.2500 (56.7708)  time: 0.3230  data: 0.0004  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 3.1763 (3.1839)  acc1: 26.5625 (26.3777)  acc5: 55.7292 (56.2668)  time: 0.3218  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.1931 (3.1862)  acc1: 26.0417 (25.8892)  acc5: 55.2083 (55.9832)  time: 0.3218  data: 0.0003  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.1804 (3.1864)  acc1: 26.0417 (25.8068)  acc5: 55.2083 (55.9538)  time: 0.3233  data: 0.0002  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.1486 (3.1808)  acc1: 26.0417 (25.9300)  acc5: 55.7292 (55.9600)  time: 0.3087  data: 0.0002  max mem: 6883
Test: Total time: 0:00:17 (0.3337 s / it)
* Acc@1 25.930 Acc@5 55.960 loss 3.181
Accuracy of the network on the 10000 test images: 25.9%
Max accuracy: 26.34%
Epoch: [20]  [  0/195]  eta: 0:04:22  lr: 0.000374  loss: 4.1841 (4.1841)  time: 1.3444  data: 0.5484  max mem: 6883
Epoch: [20]  [ 10/195]  eta: 0:02:36  lr: 0.000374  loss: 4.1755 (4.1524)  time: 0.8483  data: 0.0501  max mem: 6883
Epoch: [20]  [ 20/195]  eta: 0:02:23  lr: 0.000374  loss: 4.2127 (4.1906)  time: 0.7966  data: 0.0002  max mem: 6883
Epoch: [20]  [ 30/195]  eta: 0:02:14  lr: 0.000374  loss: 4.2023 (4.1532)  time: 0.7968  data: 0.0002  max mem: 6883
Epoch: [20]  [ 40/195]  eta: 0:02:05  lr: 0.000374  loss: 4.0495 (4.1506)  time: 0.7978  data: 0.0003  max mem: 6883
Epoch: [20]  [ 50/195]  eta: 0:01:57  lr: 0.000374  loss: 4.1012 (4.1466)  time: 0.7972  data: 0.0003  max mem: 6883
Epoch: [20]  [ 60/195]  eta: 0:01:48  lr: 0.000374  loss: 4.1741 (4.1603)  time: 0.7968  data: 0.0003  max mem: 6883
Epoch: [20]  [ 70/195]  eta: 0:01:40  lr: 0.000374  loss: 4.2907 (4.1797)  time: 0.7968  data: 0.0003  max mem: 6883
Epoch: [20]  [ 80/195]  eta: 0:01:32  lr: 0.000374  loss: 4.2633 (4.1771)  time: 0.7966  data: 0.0003  max mem: 6883
Epoch: [20]  [ 90/195]  eta: 0:01:24  lr: 0.000374  loss: 4.2166 (4.1799)  time: 0.7966  data: 0.0003  max mem: 6883
Epoch: [20]  [100/195]  eta: 0:01:16  lr: 0.000374  loss: 4.2318 (4.1823)  time: 0.7975  data: 0.0003  max mem: 6883
Epoch: [20]  [110/195]  eta: 0:01:08  lr: 0.000374  loss: 4.1218 (4.1826)  time: 0.7974  data: 0.0003  max mem: 6883
Epoch: [20]  [120/195]  eta: 0:01:00  lr: 0.000374  loss: 4.2151 (4.1873)  time: 0.7973  data: 0.0003  max mem: 6883
Epoch: [20]  [130/195]  eta: 0:00:52  lr: 0.000374  loss: 4.2134 (4.1882)  time: 0.7969  data: 0.0003  max mem: 6883
Epoch: [20]  [140/195]  eta: 0:00:44  lr: 0.000374  loss: 4.2077 (4.1904)  time: 0.7963  data: 0.0002  max mem: 6883
Epoch: [20]  [150/195]  eta: 0:00:36  lr: 0.000374  loss: 4.2704 (4.2001)  time: 0.7968  data: 0.0003  max mem: 6883
Epoch: [20]  [160/195]  eta: 0:00:28  lr: 0.000374  loss: 4.2810 (4.2027)  time: 0.7969  data: 0.0002  max mem: 6883
Epoch: [20]  [170/195]  eta: 0:00:20  lr: 0.000374  loss: 4.1515 (4.1965)  time: 0.7964  data: 0.0002  max mem: 6883
Epoch: [20]  [180/195]  eta: 0:00:12  lr: 0.000374  loss: 4.0432 (4.1936)  time: 0.8010  data: 0.0002  max mem: 6883
Epoch: [20]  [190/195]  eta: 0:00:04  lr: 0.000374  loss: 4.0835 (4.1883)  time: 0.8002  data: 0.0001  max mem: 6883
Epoch: [20]  [194/195]  eta: 0:00:00  lr: 0.000374  loss: 4.0835 (4.1905)  time: 0.7963  data: 0.0001  max mem: 6883
Epoch: [20] Total time: 0:02:36 (0.8007 s / it)
Averaged stats: lr: 0.000374  loss: 4.0835 (4.1879)
Test:  [ 0/53]  eta: 0:01:03  loss: 3.1171 (3.1171)  acc1: 32.2917 (32.2917)  acc5: 61.4583 (61.4583)  time: 1.1973  data: 0.8792  max mem: 6883
Test:  [10/53]  eta: 0:00:17  loss: 3.1806 (3.1561)  acc1: 26.0417 (26.7992)  acc5: 57.2917 (56.5341)  time: 0.4043  data: 0.0803  max mem: 6883
Test:  [20/53]  eta: 0:00:12  loss: 3.1250 (3.1294)  acc1: 27.6042 (27.2321)  acc5: 57.2917 (57.7133)  time: 0.3234  data: 0.0004  max mem: 6883
Test:  [30/53]  eta: 0:00:08  loss: 3.1223 (3.1308)  acc1: 28.1250 (27.7050)  acc5: 57.2917 (57.5773)  time: 0.3232  data: 0.0004  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.1300 (3.1359)  acc1: 27.0833 (27.1977)  acc5: 56.7708 (57.5330)  time: 0.3232  data: 0.0003  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.1175 (3.1368)  acc1: 26.0417 (26.7565)  acc5: 57.2917 (57.4142)  time: 0.3232  data: 0.0001  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.1121 (3.1305)  acc1: 26.0417 (26.8100)  acc5: 57.2917 (57.3800)  time: 0.3089  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3360 s / it)
* Acc@1 26.810 Acc@5 57.380 loss 3.130
Accuracy of the network on the 10000 test images: 26.8%
Max accuracy: 26.81%
Epoch: [21]  [  0/195]  eta: 0:04:18  lr: 0.000362  loss: 4.2946 (4.2946)  time: 1.3238  data: 0.5093  max mem: 6883
Epoch: [21]  [ 10/195]  eta: 0:02:36  lr: 0.000362  loss: 4.0502 (4.0905)  time: 0.8454  data: 0.0465  max mem: 6883
Epoch: [21]  [ 20/195]  eta: 0:02:23  lr: 0.000362  loss: 4.1401 (4.1364)  time: 0.7966  data: 0.0002  max mem: 6883
Epoch: [21]  [ 30/195]  eta: 0:02:14  lr: 0.000362  loss: 4.1950 (4.1412)  time: 0.7971  data: 0.0002  max mem: 6883
Epoch: [21]  [ 40/195]  eta: 0:02:05  lr: 0.000362  loss: 4.0899 (4.1429)  time: 0.7974  data: 0.0002  max mem: 6883
Epoch: [21]  [ 50/195]  eta: 0:01:57  lr: 0.000362  loss: 4.1412 (4.1470)  time: 0.7975  data: 0.0003  max mem: 6883
Epoch: [21]  [ 60/195]  eta: 0:01:48  lr: 0.000362  loss: 4.1660 (4.1461)  time: 0.7988  data: 0.0004  max mem: 6883
Epoch: [21]  [ 70/195]  eta: 0:01:40  lr: 0.000362  loss: 4.1883 (4.1555)  time: 0.7974  data: 0.0003  max mem: 6883
Epoch: [21]  [ 80/195]  eta: 0:01:32  lr: 0.000362  loss: 4.2970 (4.1667)  time: 0.7964  data: 0.0002  max mem: 6883
Epoch: [21]  [ 90/195]  eta: 0:01:24  lr: 0.000362  loss: 4.2970 (4.1738)  time: 0.7967  data: 0.0002  max mem: 6883
Epoch: [21]  [100/195]  eta: 0:01:16  lr: 0.000362  loss: 4.1897 (4.1723)  time: 0.7963  data: 0.0002  max mem: 6883
Epoch: [21]  [110/195]  eta: 0:01:08  lr: 0.000362  loss: 4.2389 (4.1748)  time: 0.7975  data: 0.0002  max mem: 6883
Epoch: [21]  [120/195]  eta: 0:01:00  lr: 0.000362  loss: 4.2464 (4.1765)  time: 0.7977  data: 0.0002  max mem: 6883
Epoch: [21]  [130/195]  eta: 0:00:52  lr: 0.000362  loss: 4.2219 (4.1783)  time: 0.7969  data: 0.0002  max mem: 6883
Epoch: [21]  [140/195]  eta: 0:00:44  lr: 0.000362  loss: 4.1449 (4.1757)  time: 0.7978  data: 0.0003  max mem: 6883
Epoch: [21]  [150/195]  eta: 0:00:36  lr: 0.000362  loss: 4.1449 (4.1797)  time: 0.7981  data: 0.0003  max mem: 6883
Epoch: [21]  [160/195]  eta: 0:00:28  lr: 0.000362  loss: 4.2905 (4.1817)  time: 0.7985  data: 0.0003  max mem: 6883
Epoch: [21]  [170/195]  eta: 0:00:20  lr: 0.000362  loss: 4.2140 (4.1780)  time: 0.7979  data: 0.0003  max mem: 6883
Epoch: [21]  [180/195]  eta: 0:00:12  lr: 0.000362  loss: 4.2140 (4.1791)  time: 0.7963  data: 0.0002  max mem: 6883
Epoch: [21]  [190/195]  eta: 0:00:04  lr: 0.000362  loss: 4.2044 (4.1781)  time: 0.7964  data: 0.0001  max mem: 6883
Epoch: [21]  [194/195]  eta: 0:00:00  lr: 0.000362  loss: 4.1857 (4.1791)  time: 0.7969  data: 0.0001  max mem: 6883
Epoch: [21] Total time: 0:02:36 (0.8006 s / it)
Averaged stats: lr: 0.000362  loss: 4.1857 (4.1902)
Test:  [ 0/53]  eta: 0:00:44  loss: 3.2139 (3.2139)  acc1: 32.2917 (32.2917)  acc5: 53.1250 (53.1250)  time: 0.8442  data: 0.5260  max mem: 6883
Test:  [10/53]  eta: 0:00:15  loss: 3.2266 (3.2249)  acc1: 26.5625 (25.5682)  acc5: 54.1667 (53.9773)  time: 0.3719  data: 0.0482  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 3.1966 (3.1938)  acc1: 26.5625 (26.5377)  acc5: 54.1667 (55.2331)  time: 0.3233  data: 0.0004  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 3.1829 (3.1948)  acc1: 27.0833 (26.8313)  acc5: 54.6875 (55.2587)  time: 0.3224  data: 0.0004  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.1981 (3.1967)  acc1: 26.5625 (26.6133)  acc5: 55.7292 (55.3354)  time: 0.3225  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.1860 (3.1977)  acc1: 25.5208 (26.4297)  acc5: 54.6875 (55.2185)  time: 0.3236  data: 0.0001  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.1818 (3.1915)  acc1: 25.5208 (26.3900)  acc5: 55.7292 (55.2000)  time: 0.3093  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3297 s / it)
* Acc@1 26.390 Acc@5 55.200 loss 3.192
Accuracy of the network on the 10000 test images: 26.4%
Max accuracy: 26.81%
Epoch: [22]  [  0/195]  eta: 0:04:44  lr: 0.000350  loss: 4.4461 (4.4461)  time: 1.4569  data: 0.5651  max mem: 6883
Epoch: [22]  [ 10/195]  eta: 0:02:38  lr: 0.000350  loss: 4.2521 (4.2519)  time: 0.8562  data: 0.0516  max mem: 6883
Epoch: [22]  [ 20/195]  eta: 0:02:25  lr: 0.000350  loss: 4.2305 (4.2247)  time: 0.7996  data: 0.0003  max mem: 6883
Epoch: [22]  [ 30/195]  eta: 0:02:15  lr: 0.000350  loss: 4.2440 (4.2164)  time: 0.8008  data: 0.0002  max mem: 6883
Epoch: [22]  [ 40/195]  eta: 0:02:06  lr: 0.000350  loss: 4.2495 (4.1964)  time: 0.7974  data: 0.0002  max mem: 6883
Epoch: [22]  [ 50/195]  eta: 0:01:57  lr: 0.000350  loss: 4.2001 (4.1890)  time: 0.7978  data: 0.0003  max mem: 6883
Epoch: [22]  [ 60/195]  eta: 0:01:49  lr: 0.000350  loss: 4.2001 (4.1796)  time: 0.7989  data: 0.0003  max mem: 6883
Epoch: [22]  [ 70/195]  eta: 0:01:40  lr: 0.000350  loss: 4.1997 (4.1850)  time: 0.7979  data: 0.0003  max mem: 6883
Epoch: [22]  [ 80/195]  eta: 0:01:32  lr: 0.000350  loss: 4.1997 (4.1848)  time: 0.7971  data: 0.0004  max mem: 6883
Epoch: [22]  [ 90/195]  eta: 0:01:24  lr: 0.000350  loss: 4.1603 (4.1842)  time: 0.7972  data: 0.0004  max mem: 6883
Epoch: [22]  [100/195]  eta: 0:01:16  lr: 0.000350  loss: 4.2422 (4.1835)  time: 0.7967  data: 0.0002  max mem: 6883
Epoch: [22]  [110/195]  eta: 0:01:08  lr: 0.000350  loss: 4.3077 (4.1929)  time: 0.7975  data: 0.0002  max mem: 6883
Epoch: [22]  [120/195]  eta: 0:01:00  lr: 0.000350  loss: 4.2607 (4.1948)  time: 0.7980  data: 0.0003  max mem: 6883
Epoch: [22]  [130/195]  eta: 0:00:52  lr: 0.000350  loss: 4.1954 (4.1905)  time: 0.7974  data: 0.0002  max mem: 6883
Epoch: [22]  [140/195]  eta: 0:00:44  lr: 0.000350  loss: 4.1910 (4.1866)  time: 0.7985  data: 0.0002  max mem: 6883
Epoch: [22]  [150/195]  eta: 0:00:36  lr: 0.000350  loss: 4.2433 (4.1886)  time: 0.7978  data: 0.0003  max mem: 6883
Epoch: [22]  [160/195]  eta: 0:00:28  lr: 0.000350  loss: 4.2737 (4.1898)  time: 0.7977  data: 0.0003  max mem: 6883
Epoch: [22]  [170/195]  eta: 0:00:20  lr: 0.000350  loss: 4.1552 (4.1861)  time: 0.7983  data: 0.0002  max mem: 6883
Epoch: [22]  [180/195]  eta: 0:00:12  lr: 0.000350  loss: 4.2518 (4.1923)  time: 0.7978  data: 0.0002  max mem: 6883
Epoch: [22]  [190/195]  eta: 0:00:04  lr: 0.000350  loss: 4.2518 (4.1896)  time: 0.7975  data: 0.0001  max mem: 6883
Epoch: [22]  [194/195]  eta: 0:00:00  lr: 0.000350  loss: 4.2509 (4.1909)  time: 0.7971  data: 0.0001  max mem: 6883
Epoch: [22] Total time: 0:02:36 (0.8018 s / it)
Averaged stats: lr: 0.000350  loss: 4.2509 (4.1899)
Test:  [ 0/53]  eta: 0:01:02  loss: 3.1279 (3.1279)  acc1: 31.7708 (31.7708)  acc5: 58.8542 (58.8542)  time: 1.1773  data: 0.8589  max mem: 6883
Test:  [10/53]  eta: 0:00:17  loss: 3.1397 (3.1427)  acc1: 27.6042 (27.5095)  acc5: 58.8542 (58.2386)  time: 0.4023  data: 0.0785  max mem: 6883
Test:  [20/53]  eta: 0:00:12  loss: 3.1226 (3.1222)  acc1: 28.6458 (28.5962)  acc5: 58.8542 (58.8542)  time: 0.3232  data: 0.0004  max mem: 6883
Test:  [30/53]  eta: 0:00:08  loss: 3.1118 (3.1300)  acc1: 28.6458 (28.2594)  acc5: 58.8542 (58.4845)  time: 0.3219  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.1413 (3.1348)  acc1: 26.5625 (27.9599)  acc5: 57.8125 (58.0539)  time: 0.3222  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.1413 (3.1352)  acc1: 26.0417 (27.6246)  acc5: 57.2917 (58.1291)  time: 0.3238  data: 0.0001  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.1364 (3.1283)  acc1: 26.0417 (27.6500)  acc5: 57.8125 (58.1600)  time: 0.3091  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3354 s / it)
* Acc@1 27.650 Acc@5 58.160 loss 3.128
Accuracy of the network on the 10000 test images: 27.7%
Max accuracy: 27.65%
Epoch: [23]  [  0/195]  eta: 0:04:55  lr: 0.000337  loss: 3.9819 (3.9819)  time: 1.5166  data: 0.7157  max mem: 6883
Epoch: [23]  [ 10/195]  eta: 0:02:39  lr: 0.000337  loss: 4.0758 (4.1224)  time: 0.8644  data: 0.0653  max mem: 6883
Epoch: [23]  [ 20/195]  eta: 0:02:25  lr: 0.000337  loss: 4.1215 (4.1350)  time: 0.7974  data: 0.0003  max mem: 6883
Epoch: [23]  [ 30/195]  eta: 0:02:15  lr: 0.000337  loss: 4.2073 (4.1463)  time: 0.7973  data: 0.0003  max mem: 6883
Epoch: [23]  [ 40/195]  eta: 0:02:06  lr: 0.000337  loss: 4.2691 (4.1635)  time: 0.7973  data: 0.0003  max mem: 6883
Epoch: [23]  [ 50/195]  eta: 0:01:57  lr: 0.000337  loss: 4.2720 (4.1753)  time: 0.7969  data: 0.0003  max mem: 6883
Epoch: [23]  [ 60/195]  eta: 0:01:49  lr: 0.000337  loss: 4.2345 (4.1739)  time: 0.7978  data: 0.0003  max mem: 6883
Epoch: [23]  [ 70/195]  eta: 0:01:40  lr: 0.000337  loss: 4.2253 (4.1792)  time: 0.7977  data: 0.0003  max mem: 6883
Epoch: [23]  [ 80/195]  eta: 0:01:32  lr: 0.000337  loss: 4.2554 (4.1842)  time: 0.7972  data: 0.0003  max mem: 6883
Epoch: [23]  [ 90/195]  eta: 0:01:24  lr: 0.000337  loss: 4.2509 (4.1786)  time: 0.7981  data: 0.0003  max mem: 6883
Epoch: [23]  [100/195]  eta: 0:01:16  lr: 0.000337  loss: 4.2491 (4.1874)  time: 0.7983  data: 0.0003  max mem: 6883
Epoch: [23]  [110/195]  eta: 0:01:08  lr: 0.000337  loss: 4.2491 (4.1783)  time: 0.7979  data: 0.0002  max mem: 6883
Epoch: [23]  [120/195]  eta: 0:01:00  lr: 0.000337  loss: 4.1453 (4.1763)  time: 0.7988  data: 0.0002  max mem: 6883
Epoch: [23]  [130/195]  eta: 0:00:52  lr: 0.000337  loss: 4.1453 (4.1692)  time: 0.7975  data: 0.0002  max mem: 6883
Epoch: [23]  [140/195]  eta: 0:00:44  lr: 0.000337  loss: 4.1920 (4.1689)  time: 0.7975  data: 0.0002  max mem: 6883
Epoch: [23]  [150/195]  eta: 0:00:36  lr: 0.000337  loss: 4.2133 (4.1690)  time: 0.7975  data: 0.0002  max mem: 6883
Epoch: [23]  [160/195]  eta: 0:00:28  lr: 0.000337  loss: 4.1688 (4.1707)  time: 0.7962  data: 0.0002  max mem: 6883
Epoch: [23]  [170/195]  eta: 0:00:20  lr: 0.000337  loss: 4.1942 (4.1713)  time: 0.7958  data: 0.0002  max mem: 6883
Epoch: [23]  [180/195]  eta: 0:00:12  lr: 0.000337  loss: 4.2823 (4.1752)  time: 0.7950  data: 0.0002  max mem: 6883
Epoch: [23]  [190/195]  eta: 0:00:04  lr: 0.000337  loss: 4.2906 (4.1776)  time: 0.7946  data: 0.0001  max mem: 6883
Epoch: [23]  [194/195]  eta: 0:00:00  lr: 0.000337  loss: 4.2823 (4.1774)  time: 0.7956  data: 0.0001  max mem: 6883
Epoch: [23] Total time: 0:02:36 (0.8015 s / it)
Averaged stats: lr: 0.000337  loss: 4.2823 (4.1836)
Test:  [ 0/53]  eta: 0:01:00  loss: 3.1091 (3.1091)  acc1: 30.2083 (30.2083)  acc5: 58.8542 (58.8542)  time: 1.1373  data: 0.8179  max mem: 6883
Test:  [10/53]  eta: 0:00:17  loss: 3.1310 (3.1328)  acc1: 28.6458 (27.5095)  acc5: 58.8542 (58.3333)  time: 0.3964  data: 0.0747  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 3.0912 (3.1113)  acc1: 28.1250 (27.9266)  acc5: 58.8542 (58.8294)  time: 0.3222  data: 0.0004  max mem: 6883
Test:  [30/53]  eta: 0:00:08  loss: 3.0912 (3.1143)  acc1: 28.1250 (28.1418)  acc5: 58.8542 (58.6694)  time: 0.3233  data: 0.0004  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.1266 (3.1214)  acc1: 26.5625 (27.7693)  acc5: 57.8125 (58.4096)  time: 0.3230  data: 0.0003  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.1266 (3.1225)  acc1: 26.5625 (27.5225)  acc5: 56.7708 (58.2516)  time: 0.3232  data: 0.0001  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.1094 (3.1171)  acc1: 26.5625 (27.5200)  acc5: 58.8542 (58.2900)  time: 0.3088  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3346 s / it)
* Acc@1 27.520 Acc@5 58.290 loss 3.117
Accuracy of the network on the 10000 test images: 27.5%
Max accuracy: 27.65%
Epoch: [24]  [  0/195]  eta: 0:04:55  lr: 0.000325  loss: 4.0499 (4.0499)  time: 1.5150  data: 0.6517  max mem: 6883
Epoch: [24]  [ 10/195]  eta: 0:02:39  lr: 0.000325  loss: 4.2455 (4.2139)  time: 0.8622  data: 0.0594  max mem: 6883
Epoch: [24]  [ 20/195]  eta: 0:02:25  lr: 0.000325  loss: 4.2507 (4.2247)  time: 0.7965  data: 0.0002  max mem: 6883
Epoch: [24]  [ 30/195]  eta: 0:02:15  lr: 0.000325  loss: 4.2157 (4.1877)  time: 0.7977  data: 0.0003  max mem: 6883
Epoch: [24]  [ 40/195]  eta: 0:02:06  lr: 0.000325  loss: 4.2854 (4.2200)  time: 0.7971  data: 0.0003  max mem: 6883
Epoch: [24]  [ 50/195]  eta: 0:01:57  lr: 0.000325  loss: 4.3069 (4.2001)  time: 0.7969  data: 0.0003  max mem: 6883
Epoch: [24]  [ 60/195]  eta: 0:01:49  lr: 0.000325  loss: 4.2199 (4.2089)  time: 0.7982  data: 0.0003  max mem: 6883
Epoch: [24]  [ 70/195]  eta: 0:01:40  lr: 0.000325  loss: 4.2199 (4.2032)  time: 0.7980  data: 0.0003  max mem: 6883
Epoch: [24]  [ 80/195]  eta: 0:01:32  lr: 0.000325  loss: 4.1592 (4.1967)  time: 0.7982  data: 0.0003  max mem: 6883
Epoch: [24]  [ 90/195]  eta: 0:01:24  lr: 0.000325  loss: 4.1404 (4.1907)  time: 0.7970  data: 0.0003  max mem: 6883
Epoch: [24]  [100/195]  eta: 0:01:16  lr: 0.000325  loss: 4.1404 (4.1899)  time: 0.7971  data: 0.0003  max mem: 6883
Epoch: [24]  [110/195]  eta: 0:01:08  lr: 0.000325  loss: 4.2138 (4.1892)  time: 0.7980  data: 0.0003  max mem: 6883
Epoch: [24]  [120/195]  eta: 0:01:00  lr: 0.000325  loss: 4.2138 (4.1814)  time: 0.7973  data: 0.0003  max mem: 6883
Epoch: [24]  [130/195]  eta: 0:00:52  lr: 0.000325  loss: 4.1077 (4.1721)  time: 0.7974  data: 0.0003  max mem: 6883
Epoch: [24]  [140/195]  eta: 0:00:44  lr: 0.000325  loss: 4.1444 (4.1760)  time: 0.7975  data: 0.0003  max mem: 6883
Epoch: [24]  [150/195]  eta: 0:00:36  lr: 0.000325  loss: 4.2476 (4.1730)  time: 0.7973  data: 0.0003  max mem: 6883
Epoch: [24]  [160/195]  eta: 0:00:28  lr: 0.000325  loss: 4.2241 (4.1724)  time: 0.7972  data: 0.0003  max mem: 6883
Epoch: [24]  [170/195]  eta: 0:00:20  lr: 0.000325  loss: 4.2384 (4.1737)  time: 0.7969  data: 0.0003  max mem: 6883
Epoch: [24]  [180/195]  eta: 0:00:12  lr: 0.000325  loss: 4.1957 (4.1679)  time: 0.7965  data: 0.0003  max mem: 6883
Epoch: [24]  [190/195]  eta: 0:00:04  lr: 0.000325  loss: 4.0253 (4.1611)  time: 0.7957  data: 0.0002  max mem: 6883
Epoch: [24]  [194/195]  eta: 0:00:00  lr: 0.000325  loss: 4.0253 (4.1604)  time: 0.7963  data: 0.0001  max mem: 6883
Epoch: [24] Total time: 0:02:36 (0.8015 s / it)
Averaged stats: lr: 0.000325  loss: 4.0253 (4.1582)
Test:  [ 0/53]  eta: 0:01:00  loss: 3.0668 (3.0668)  acc1: 29.6875 (29.6875)  acc5: 57.2917 (57.2917)  time: 1.1409  data: 0.8232  max mem: 6883
Test:  [10/53]  eta: 0:00:17  loss: 3.0668 (3.0737)  acc1: 27.0833 (28.1723)  acc5: 57.8125 (58.5227)  time: 0.3980  data: 0.0752  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 3.0465 (3.0526)  acc1: 28.6458 (28.9931)  acc5: 58.3333 (58.8294)  time: 0.3228  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:08  loss: 3.0354 (3.0507)  acc1: 29.1667 (29.0827)  acc5: 58.8542 (58.9214)  time: 0.3226  data: 0.0004  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.0441 (3.0551)  acc1: 27.6042 (28.5950)  acc5: 58.8542 (58.7907)  time: 0.3225  data: 0.0003  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.0441 (3.0578)  acc1: 26.5625 (28.4109)  acc5: 58.3333 (58.6703)  time: 0.3234  data: 0.0001  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.0342 (3.0514)  acc1: 26.5625 (28.3900)  acc5: 58.3333 (58.6600)  time: 0.3091  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3351 s / it)
* Acc@1 28.390 Acc@5 58.660 loss 3.051
Accuracy of the network on the 10000 test images: 28.4%
Max accuracy: 28.39%
Epoch: [25]  [  0/195]  eta: 0:04:45  lr: 0.000313  loss: 4.1788 (4.1788)  time: 1.4651  data: 0.6705  max mem: 6883
Epoch: [25]  [ 10/195]  eta: 0:02:39  lr: 0.000313  loss: 4.1966 (4.1998)  time: 0.8640  data: 0.0612  max mem: 6883
Epoch: [25]  [ 20/195]  eta: 0:02:25  lr: 0.000313  loss: 4.1966 (4.1919)  time: 0.7990  data: 0.0002  max mem: 6883
Epoch: [25]  [ 30/195]  eta: 0:02:15  lr: 0.000313  loss: 4.1247 (4.1837)  time: 0.7953  data: 0.0002  max mem: 6883
Epoch: [25]  [ 40/195]  eta: 0:02:06  lr: 0.000313  loss: 4.1247 (4.1783)  time: 0.7962  data: 0.0002  max mem: 6883
Epoch: [25]  [ 50/195]  eta: 0:01:57  lr: 0.000313  loss: 4.1483 (4.1707)  time: 0.7978  data: 0.0002  max mem: 6883
Epoch: [25]  [ 60/195]  eta: 0:01:49  lr: 0.000313  loss: 4.2488 (4.1826)  time: 0.7975  data: 0.0002  max mem: 6883
Epoch: [25]  [ 70/195]  eta: 0:01:40  lr: 0.000313  loss: 4.2265 (4.1751)  time: 0.7964  data: 0.0002  max mem: 6883
Epoch: [25]  [ 80/195]  eta: 0:01:32  lr: 0.000313  loss: 4.1611 (4.1663)  time: 0.7977  data: 0.0002  max mem: 6883
Epoch: [25]  [ 90/195]  eta: 0:01:24  lr: 0.000313  loss: 4.1262 (4.1588)  time: 0.7969  data: 0.0002  max mem: 6883
Epoch: [25]  [100/195]  eta: 0:01:16  lr: 0.000313  loss: 4.1596 (4.1647)  time: 0.7957  data: 0.0002  max mem: 6883
Epoch: [25]  [110/195]  eta: 0:01:08  lr: 0.000313  loss: 4.2715 (4.1728)  time: 0.7968  data: 0.0002  max mem: 6883
Epoch: [25]  [120/195]  eta: 0:01:00  lr: 0.000313  loss: 4.2519 (4.1735)  time: 0.7974  data: 0.0002  max mem: 6883
Epoch: [25]  [130/195]  eta: 0:00:52  lr: 0.000313  loss: 4.2460 (4.1795)  time: 0.7972  data: 0.0002  max mem: 6883
Epoch: [25]  [140/195]  eta: 0:00:44  lr: 0.000313  loss: 4.1939 (4.1762)  time: 0.8017  data: 0.0002  max mem: 6883
Epoch: [25]  [150/195]  eta: 0:00:36  lr: 0.000313  loss: 4.1827 (4.1746)  time: 0.8013  data: 0.0002  max mem: 6883
Epoch: [25]  [160/195]  eta: 0:00:28  lr: 0.000313  loss: 4.1204 (4.1666)  time: 0.7977  data: 0.0004  max mem: 6883
Epoch: [25]  [170/195]  eta: 0:00:20  lr: 0.000313  loss: 4.1594 (4.1675)  time: 0.7980  data: 0.0004  max mem: 6883
Epoch: [25]  [180/195]  eta: 0:00:12  lr: 0.000313  loss: 4.1867 (4.1640)  time: 0.7984  data: 0.0002  max mem: 6883
Epoch: [25]  [190/195]  eta: 0:00:04  lr: 0.000313  loss: 4.1867 (4.1646)  time: 0.7984  data: 0.0001  max mem: 6883
Epoch: [25]  [194/195]  eta: 0:00:00  lr: 0.000313  loss: 4.1820 (4.1646)  time: 0.7986  data: 0.0001  max mem: 6883
Epoch: [25] Total time: 0:02:36 (0.8022 s / it)
Averaged stats: lr: 0.000313  loss: 4.1820 (4.1578)
Test:  [ 0/53]  eta: 0:00:55  loss: 3.1076 (3.1076)  acc1: 33.3333 (33.3333)  acc5: 60.4167 (60.4167)  time: 1.0515  data: 0.7316  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 3.1072 (3.1003)  acc1: 27.6042 (28.5038)  acc5: 58.3333 (58.0019)  time: 0.3910  data: 0.0669  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 3.0842 (3.0777)  acc1: 27.6042 (28.7450)  acc5: 57.8125 (58.7550)  time: 0.3234  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 3.0822 (3.0767)  acc1: 28.6458 (28.9651)  acc5: 59.3750 (58.8206)  time: 0.3227  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.0822 (3.0795)  acc1: 28.1250 (28.8110)  acc5: 59.3750 (58.9685)  time: 0.3226  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.0829 (3.0815)  acc1: 28.1250 (28.5743)  acc5: 58.3333 (58.7623)  time: 0.3240  data: 0.0001  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.0765 (3.0761)  acc1: 28.1250 (28.5500)  acc5: 59.3750 (58.7900)  time: 0.3093  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3334 s / it)
* Acc@1 28.550 Acc@5 58.790 loss 3.076
Accuracy of the network on the 10000 test images: 28.6%
Max accuracy: 28.55%
Epoch: [26]  [  0/195]  eta: 0:04:35  lr: 0.000300  loss: 3.9855 (3.9855)  time: 1.4151  data: 0.6154  max mem: 6883
Epoch: [26]  [ 10/195]  eta: 0:02:38  lr: 0.000300  loss: 4.2277 (4.1605)  time: 0.8565  data: 0.0561  max mem: 6883
Epoch: [26]  [ 20/195]  eta: 0:02:24  lr: 0.000300  loss: 4.2277 (4.1571)  time: 0.7991  data: 0.0002  max mem: 6883
Epoch: [26]  [ 30/195]  eta: 0:02:15  lr: 0.000300  loss: 4.1595 (4.1466)  time: 0.7987  data: 0.0002  max mem: 6883
Epoch: [26]  [ 40/195]  eta: 0:02:06  lr: 0.000300  loss: 4.1595 (4.1602)  time: 0.7987  data: 0.0002  max mem: 6883
Epoch: [26]  [ 50/195]  eta: 0:01:57  lr: 0.000300  loss: 4.2189 (4.1613)  time: 0.7988  data: 0.0002  max mem: 6883
Epoch: [26]  [ 60/195]  eta: 0:01:49  lr: 0.000300  loss: 4.1910 (4.1684)  time: 0.7989  data: 0.0002  max mem: 6883
Epoch: [26]  [ 70/195]  eta: 0:01:40  lr: 0.000300  loss: 4.1843 (4.1724)  time: 0.7990  data: 0.0002  max mem: 6883
Epoch: [26]  [ 80/195]  eta: 0:01:32  lr: 0.000300  loss: 4.1902 (4.1708)  time: 0.7990  data: 0.0002  max mem: 6883
Epoch: [26]  [ 90/195]  eta: 0:01:24  lr: 0.000300  loss: 4.2055 (4.1758)  time: 0.7987  data: 0.0002  max mem: 6883
Epoch: [26]  [100/195]  eta: 0:01:16  lr: 0.000300  loss: 4.1993 (4.1755)  time: 0.7988  data: 0.0002  max mem: 6883
Epoch: [26]  [110/195]  eta: 0:01:08  lr: 0.000300  loss: 4.2035 (4.1781)  time: 0.7989  data: 0.0002  max mem: 6883
Epoch: [26]  [120/195]  eta: 0:01:00  lr: 0.000300  loss: 4.1863 (4.1728)  time: 0.7994  data: 0.0002  max mem: 6883
Epoch: [26]  [130/195]  eta: 0:00:52  lr: 0.000300  loss: 4.1569 (4.1748)  time: 0.7987  data: 0.0002  max mem: 6883
Epoch: [26]  [140/195]  eta: 0:00:44  lr: 0.000300  loss: 4.2514 (4.1768)  time: 0.7992  data: 0.0002  max mem: 6883
Epoch: [26]  [150/195]  eta: 0:00:36  lr: 0.000300  loss: 4.2151 (4.1754)  time: 0.7999  data: 0.0002  max mem: 6883
Epoch: [26]  [160/195]  eta: 0:00:28  lr: 0.000300  loss: 4.1652 (4.1720)  time: 0.7997  data: 0.0002  max mem: 6883
Epoch: [26]  [170/195]  eta: 0:00:20  lr: 0.000300  loss: 4.1893 (4.1761)  time: 0.7992  data: 0.0002  max mem: 6883
Epoch: [26]  [180/195]  eta: 0:00:12  lr: 0.000300  loss: 4.1909 (4.1733)  time: 0.7995  data: 0.0002  max mem: 6883
Epoch: [26]  [190/195]  eta: 0:00:04  lr: 0.000300  loss: 4.1741 (4.1740)  time: 0.7988  data: 0.0001  max mem: 6883
Epoch: [26]  [194/195]  eta: 0:00:00  lr: 0.000300  loss: 4.1638 (4.1730)  time: 0.7986  data: 0.0001  max mem: 6883
Epoch: [26] Total time: 0:02:36 (0.8028 s / it)
Averaged stats: lr: 0.000300  loss: 4.1638 (4.1622)
Test:  [ 0/53]  eta: 0:00:53  loss: 2.9986 (2.9986)  acc1: 34.3750 (34.3750)  acc5: 60.4167 (60.4167)  time: 1.0088  data: 0.6840  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 3.0473 (3.0492)  acc1: 27.0833 (29.1193)  acc5: 58.8542 (58.7121)  time: 0.3879  data: 0.0625  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 3.0467 (3.0281)  acc1: 28.6458 (29.3651)  acc5: 58.3333 (59.4742)  time: 0.3239  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 3.0358 (3.0316)  acc1: 29.6875 (29.3851)  acc5: 58.3333 (59.1902)  time: 0.3244  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.0358 (3.0357)  acc1: 28.6458 (28.9888)  acc5: 58.8542 (59.1972)  time: 0.3249  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.0318 (3.0359)  acc1: 28.1250 (28.8297)  acc5: 58.8542 (59.1708)  time: 0.3254  data: 0.0001  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 3.0086 (3.0284)  acc1: 28.6458 (28.8400)  acc5: 59.3750 (59.2200)  time: 0.3107  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3340 s / it)
* Acc@1 28.840 Acc@5 59.220 loss 3.028
Accuracy of the network on the 10000 test images: 28.8%
Max accuracy: 28.84%
Epoch: [27]  [  0/195]  eta: 0:04:39  lr: 0.000287  loss: 4.3186 (4.3186)  time: 1.4318  data: 0.6334  max mem: 6883
Epoch: [27]  [ 10/195]  eta: 0:02:38  lr: 0.000287  loss: 4.2245 (4.1769)  time: 0.8582  data: 0.0577  max mem: 6883
Epoch: [27]  [ 20/195]  eta: 0:02:25  lr: 0.000287  loss: 4.2332 (4.1842)  time: 0.7989  data: 0.0002  max mem: 6883
Epoch: [27]  [ 30/195]  eta: 0:02:15  lr: 0.000287  loss: 4.2485 (4.1834)  time: 0.7990  data: 0.0002  max mem: 6883
Epoch: [27]  [ 40/195]  eta: 0:02:06  lr: 0.000287  loss: 4.1947 (4.1860)  time: 0.7985  data: 0.0002  max mem: 6883
Epoch: [27]  [ 50/195]  eta: 0:01:57  lr: 0.000287  loss: 4.1871 (4.1825)  time: 0.7975  data: 0.0002  max mem: 6883
Epoch: [27]  [ 60/195]  eta: 0:01:49  lr: 0.000287  loss: 4.1460 (4.1710)  time: 0.7978  data: 0.0002  max mem: 6883
Epoch: [27]  [ 70/195]  eta: 0:01:40  lr: 0.000287  loss: 4.0850 (4.1628)  time: 0.7978  data: 0.0002  max mem: 6883
Epoch: [27]  [ 80/195]  eta: 0:01:32  lr: 0.000287  loss: 4.2417 (4.1713)  time: 0.7978  data: 0.0002  max mem: 6883
Epoch: [27]  [ 90/195]  eta: 0:01:24  lr: 0.000287  loss: 4.2580 (4.1677)  time: 0.7976  data: 0.0002  max mem: 6883
Epoch: [27]  [100/195]  eta: 0:01:16  lr: 0.000287  loss: 4.1144 (4.1599)  time: 0.7979  data: 0.0002  max mem: 6883
Epoch: [27]  [110/195]  eta: 0:01:08  lr: 0.000287  loss: 4.0796 (4.1561)  time: 0.7981  data: 0.0002  max mem: 6883
Epoch: [27]  [120/195]  eta: 0:01:00  lr: 0.000287  loss: 4.0883 (4.1579)  time: 0.7985  data: 0.0002  max mem: 6883
Epoch: [27]  [130/195]  eta: 0:00:52  lr: 0.000287  loss: 4.2083 (4.1566)  time: 0.7982  data: 0.0002  max mem: 6883
Epoch: [27]  [140/195]  eta: 0:00:44  lr: 0.000287  loss: 4.2083 (4.1615)  time: 0.7990  data: 0.0002  max mem: 6883
Epoch: [27]  [150/195]  eta: 0:00:36  lr: 0.000287  loss: 4.2066 (4.1612)  time: 0.7981  data: 0.0003  max mem: 6883
Epoch: [27]  [160/195]  eta: 0:00:28  lr: 0.000287  loss: 4.1653 (4.1602)  time: 0.7980  data: 0.0003  max mem: 6883
Epoch: [27]  [170/195]  eta: 0:00:20  lr: 0.000287  loss: 4.1947 (4.1605)  time: 0.7976  data: 0.0002  max mem: 6883
Epoch: [27]  [180/195]  eta: 0:00:12  lr: 0.000287  loss: 4.2053 (4.1621)  time: 0.7969  data: 0.0002  max mem: 6883
Epoch: [27]  [190/195]  eta: 0:00:04  lr: 0.000287  loss: 4.2767 (4.1630)  time: 0.7969  data: 0.0001  max mem: 6883
Epoch: [27]  [194/195]  eta: 0:00:00  lr: 0.000287  loss: 4.2767 (4.1618)  time: 0.7970  data: 0.0001  max mem: 6883
Epoch: [27] Total time: 0:02:36 (0.8019 s / it)
Averaged stats: lr: 0.000287  loss: 4.2767 (4.1574)
Test:  [ 0/53]  eta: 0:00:46  loss: 3.0022 (3.0022)  acc1: 36.9792 (36.9792)  acc5: 61.9792 (61.9792)  time: 0.8788  data: 0.5579  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 3.0226 (3.0283)  acc1: 29.1667 (29.4981)  acc5: 61.9792 (61.2216)  time: 0.3751  data: 0.0510  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 3.0035 (3.0084)  acc1: 30.2083 (30.2331)  acc5: 60.4167 (61.3839)  time: 0.3235  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 3.0060 (3.0117)  acc1: 30.7292 (30.6116)  acc5: 60.4167 (61.0047)  time: 0.3233  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.0122 (3.0161)  acc1: 30.2083 (30.3862)  acc5: 60.4167 (61.0010)  time: 0.3236  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 3.0058 (3.0172)  acc1: 30.2083 (30.0551)  acc5: 60.4167 (60.8967)  time: 0.3236  data: 0.0002  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 2.9978 (3.0110)  acc1: 30.2083 (30.1400)  acc5: 60.9375 (60.8900)  time: 0.3088  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3301 s / it)
* Acc@1 30.140 Acc@5 60.890 loss 3.011
Accuracy of the network on the 10000 test images: 30.1%
Max accuracy: 30.14%
Epoch: [28]  [  0/195]  eta: 0:04:15  lr: 0.000275  loss: 4.2906 (4.2906)  time: 1.3104  data: 0.5125  max mem: 6883
Epoch: [28]  [ 10/195]  eta: 0:02:36  lr: 0.000275  loss: 4.2619 (4.2382)  time: 0.8458  data: 0.0468  max mem: 6883
Epoch: [28]  [ 20/195]  eta: 0:02:23  lr: 0.000275  loss: 4.1974 (4.1829)  time: 0.7977  data: 0.0002  max mem: 6883
Epoch: [28]  [ 30/195]  eta: 0:02:14  lr: 0.000275  loss: 4.2152 (4.1808)  time: 0.7972  data: 0.0002  max mem: 6883
Epoch: [28]  [ 40/195]  eta: 0:02:05  lr: 0.000275  loss: 4.2152 (4.1803)  time: 0.7976  data: 0.0002  max mem: 6883
Epoch: [28]  [ 50/195]  eta: 0:01:57  lr: 0.000275  loss: 4.1715 (4.1828)  time: 0.7983  data: 0.0002  max mem: 6883
Epoch: [28]  [ 60/195]  eta: 0:01:48  lr: 0.000275  loss: 4.2265 (4.1846)  time: 0.7982  data: 0.0002  max mem: 6883
Epoch: [28]  [ 70/195]  eta: 0:01:40  lr: 0.000275  loss: 4.2302 (4.1807)  time: 0.7977  data: 0.0003  max mem: 6883
Epoch: [28]  [ 80/195]  eta: 0:01:32  lr: 0.000275  loss: 4.1582 (4.1772)  time: 0.7978  data: 0.0002  max mem: 6883
Epoch: [28]  [ 90/195]  eta: 0:01:24  lr: 0.000275  loss: 4.1904 (4.1715)  time: 0.7977  data: 0.0002  max mem: 6883
Epoch: [28]  [100/195]  eta: 0:01:16  lr: 0.000275  loss: 4.0972 (4.1641)  time: 0.7981  data: 0.0002  max mem: 6883
Epoch: [28]  [110/195]  eta: 0:01:08  lr: 0.000275  loss: 4.2202 (4.1681)  time: 0.7977  data: 0.0002  max mem: 6883
Epoch: [28]  [120/195]  eta: 0:01:00  lr: 0.000275  loss: 4.1900 (4.1606)  time: 0.7972  data: 0.0002  max mem: 6883
Epoch: [28]  [130/195]  eta: 0:00:52  lr: 0.000275  loss: 4.1467 (4.1622)  time: 0.7975  data: 0.0003  max mem: 6883
Epoch: [28]  [140/195]  eta: 0:00:44  lr: 0.000275  loss: 4.1747 (4.1606)  time: 0.7983  data: 0.0003  max mem: 6883
Epoch: [28]  [150/195]  eta: 0:00:36  lr: 0.000275  loss: 4.0905 (4.1588)  time: 0.7974  data: 0.0003  max mem: 6883
Epoch: [28]  [160/195]  eta: 0:00:28  lr: 0.000275  loss: 4.0845 (4.1570)  time: 0.7976  data: 0.0002  max mem: 6883
Epoch: [28]  [170/195]  eta: 0:00:20  lr: 0.000275  loss: 4.1724 (4.1576)  time: 0.7978  data: 0.0002  max mem: 6883
Epoch: [28]  [180/195]  eta: 0:00:12  lr: 0.000275  loss: 4.2459 (4.1605)  time: 0.7983  data: 0.0002  max mem: 6883
Epoch: [28]  [190/195]  eta: 0:00:04  lr: 0.000275  loss: 4.1452 (4.1571)  time: 0.7981  data: 0.0002  max mem: 6883
Epoch: [28]  [194/195]  eta: 0:00:00  lr: 0.000275  loss: 4.1452 (4.1578)  time: 0.7971  data: 0.0001  max mem: 6883
Epoch: [28] Total time: 0:02:36 (0.8009 s / it)
Averaged stats: lr: 0.000275  loss: 4.1452 (4.1517)
Test:  [ 0/53]  eta: 0:00:42  loss: 3.0034 (3.0034)  acc1: 33.3333 (33.3333)  acc5: 61.9792 (61.9792)  time: 0.8030  data: 0.4857  max mem: 6883
Test:  [10/53]  eta: 0:00:15  loss: 3.0399 (3.0187)  acc1: 29.1667 (28.8826)  acc5: 61.9792 (60.8428)  time: 0.3674  data: 0.0445  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 2.9943 (2.9985)  acc1: 29.1667 (29.4891)  acc5: 60.9375 (61.5575)  time: 0.3226  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 3.0040 (3.0049)  acc1: 29.6875 (29.4187)  acc5: 60.9375 (61.1895)  time: 0.3226  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 3.0184 (3.0086)  acc1: 28.6458 (29.2048)  acc5: 60.4167 (61.1535)  time: 0.3232  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:00  loss: 3.0111 (3.0113)  acc1: 28.6458 (28.9624)  acc5: 60.9375 (61.0396)  time: 0.3239  data: 0.0001  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 2.9925 (3.0026)  acc1: 28.6458 (29.0300)  acc5: 60.9375 (60.9300)  time: 0.3090  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3296 s / it)
* Acc@1 29.030 Acc@5 60.930 loss 3.003
Accuracy of the network on the 10000 test images: 29.0%
Max accuracy: 30.14%
Epoch: [29]  [  0/195]  eta: 0:04:42  lr: 0.000263  loss: 4.0585 (4.0585)  time: 1.4477  data: 0.6466  max mem: 6883
Epoch: [29]  [ 10/195]  eta: 0:02:38  lr: 0.000263  loss: 4.0585 (4.1029)  time: 0.8590  data: 0.0589  max mem: 6883
Epoch: [29]  [ 20/195]  eta: 0:02:25  lr: 0.000263  loss: 4.0592 (4.0821)  time: 0.7986  data: 0.0002  max mem: 6883
Epoch: [29]  [ 30/195]  eta: 0:02:15  lr: 0.000263  loss: 4.1773 (4.1231)  time: 0.7983  data: 0.0003  max mem: 6883
Epoch: [29]  [ 40/195]  eta: 0:02:06  lr: 0.000263  loss: 4.2233 (4.1372)  time: 0.7979  data: 0.0003  max mem: 6883
Epoch: [29]  [ 50/195]  eta: 0:01:57  lr: 0.000263  loss: 4.2538 (4.1614)  time: 0.7974  data: 0.0003  max mem: 6883
Epoch: [29]  [ 60/195]  eta: 0:01:49  lr: 0.000263  loss: 4.2111 (4.1461)  time: 0.7968  data: 0.0002  max mem: 6883
Epoch: [29]  [ 70/195]  eta: 0:01:40  lr: 0.000263  loss: 4.1053 (4.1352)  time: 0.7972  data: 0.0002  max mem: 6883
Epoch: [29]  [ 80/195]  eta: 0:01:32  lr: 0.000263  loss: 4.1053 (4.1292)  time: 0.7978  data: 0.0002  max mem: 6883
Epoch: [29]  [ 90/195]  eta: 0:01:24  lr: 0.000263  loss: 4.1118 (4.1290)  time: 0.7972  data: 0.0002  max mem: 6883
Epoch: [29]  [100/195]  eta: 0:01:16  lr: 0.000263  loss: 4.1610 (4.1330)  time: 0.7980  data: 0.0002  max mem: 6883
Epoch: [29]  [110/195]  eta: 0:01:08  lr: 0.000263  loss: 4.1976 (4.1418)  time: 0.7963  data: 0.0002  max mem: 6883
Epoch: [29]  [120/195]  eta: 0:01:00  lr: 0.000263  loss: 4.1847 (4.1374)  time: 0.7963  data: 0.0002  max mem: 6883
Epoch: [29]  [130/195]  eta: 0:00:52  lr: 0.000263  loss: 4.2074 (4.1446)  time: 0.7970  data: 0.0003  max mem: 6883
Epoch: [29]  [140/195]  eta: 0:00:44  lr: 0.000263  loss: 4.1779 (4.1375)  time: 0.7968  data: 0.0002  max mem: 6883
Epoch: [29]  [150/195]  eta: 0:00:36  lr: 0.000263  loss: 4.1250 (4.1380)  time: 0.7975  data: 0.0002  max mem: 6883
Epoch: [29]  [160/195]  eta: 0:00:28  lr: 0.000263  loss: 4.1759 (4.1359)  time: 0.7985  data: 0.0003  max mem: 6883
Epoch: [29]  [170/195]  eta: 0:00:20  lr: 0.000263  loss: 4.2583 (4.1455)  time: 0.7982  data: 0.0003  max mem: 6883
Epoch: [29]  [180/195]  eta: 0:00:12  lr: 0.000263  loss: 4.2569 (4.1484)  time: 0.7983  data: 0.0002  max mem: 6883
Epoch: [29]  [190/195]  eta: 0:00:04  lr: 0.000263  loss: 4.1773 (4.1488)  time: 0.7974  data: 0.0001  max mem: 6883
Epoch: [29]  [194/195]  eta: 0:00:00  lr: 0.000263  loss: 4.1724 (4.1477)  time: 0.7976  data: 0.0001  max mem: 6883
Epoch: [29] Total time: 0:02:36 (0.8014 s / it)
Averaged stats: lr: 0.000263  loss: 4.1724 (4.1327)
Test:  [ 0/53]  eta: 0:00:43  loss: 3.0153 (3.0153)  acc1: 33.3333 (33.3333)  acc5: 63.0208 (63.0208)  time: 0.8124  data: 0.4925  max mem: 6883
Test:  [10/53]  eta: 0:00:15  loss: 3.0081 (2.9883)  acc1: 31.2500 (30.4451)  acc5: 61.4583 (61.2216)  time: 0.3692  data: 0.0452  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 2.9696 (2.9654)  acc1: 31.2500 (31.0764)  acc5: 60.4167 (61.5079)  time: 0.3233  data: 0.0004  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 2.9696 (2.9659)  acc1: 31.7708 (31.3172)  acc5: 59.8958 (61.1223)  time: 0.3226  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 2.9802 (2.9703)  acc1: 30.2083 (30.8181)  acc5: 59.3750 (61.1153)  time: 0.3228  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:00  loss: 2.9658 (2.9736)  acc1: 29.6875 (30.4739)  acc5: 60.9375 (61.1213)  time: 0.3236  data: 0.0001  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 2.9481 (2.9678)  acc1: 29.6875 (30.5200)  acc5: 61.9792 (61.1000)  time: 0.3091  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3285 s / it)
* Acc@1 30.520 Acc@5 61.100 loss 2.968
Accuracy of the network on the 10000 test images: 30.5%
Max accuracy: 30.52%
Epoch: [30]  [  0/195]  eta: 0:04:37  lr: 0.000250  loss: 3.7975 (3.7975)  time: 1.4241  data: 0.6251  max mem: 6883
Epoch: [30]  [ 10/195]  eta: 0:02:38  lr: 0.000250  loss: 4.2460 (4.1898)  time: 0.8553  data: 0.0571  max mem: 6883
Epoch: [30]  [ 20/195]  eta: 0:02:24  lr: 0.000250  loss: 4.2460 (4.2183)  time: 0.7966  data: 0.0002  max mem: 6883
Epoch: [30]  [ 30/195]  eta: 0:02:14  lr: 0.000250  loss: 4.2422 (4.2121)  time: 0.7960  data: 0.0002  max mem: 6883
Epoch: [30]  [ 40/195]  eta: 0:02:05  lr: 0.000250  loss: 4.1299 (4.1870)  time: 0.7967  data: 0.0002  max mem: 6883
Epoch: [30]  [ 50/195]  eta: 0:01:57  lr: 0.000250  loss: 3.9337 (4.1362)  time: 0.7976  data: 0.0002  max mem: 6883
Epoch: [30]  [ 60/195]  eta: 0:01:48  lr: 0.000250  loss: 4.0562 (4.1412)  time: 0.7980  data: 0.0003  max mem: 6883
Epoch: [30]  [ 70/195]  eta: 0:01:40  lr: 0.000250  loss: 4.1732 (4.1477)  time: 0.7987  data: 0.0003  max mem: 6883
Epoch: [30]  [ 80/195]  eta: 0:01:32  lr: 0.000250  loss: 4.1982 (4.1495)  time: 0.7984  data: 0.0002  max mem: 6883
Epoch: [30]  [ 90/195]  eta: 0:01:24  lr: 0.000250  loss: 4.2522 (4.1559)  time: 0.7978  data: 0.0002  max mem: 6883
Epoch: [30]  [100/195]  eta: 0:01:16  lr: 0.000250  loss: 4.1945 (4.1584)  time: 0.8033  data: 0.0002  max mem: 6883
Epoch: [30]  [110/195]  eta: 0:01:08  lr: 0.000250  loss: 4.0651 (4.1445)  time: 0.8034  data: 0.0002  max mem: 6883
Epoch: [30]  [120/195]  eta: 0:01:00  lr: 0.000250  loss: 4.1132 (4.1484)  time: 0.7987  data: 0.0002  max mem: 6883
Epoch: [30]  [130/195]  eta: 0:00:52  lr: 0.000250  loss: 4.1936 (4.1533)  time: 0.7978  data: 0.0003  max mem: 6883
Epoch: [30]  [140/195]  eta: 0:00:44  lr: 0.000250  loss: 4.1684 (4.1493)  time: 0.7975  data: 0.0003  max mem: 6883
Epoch: [30]  [150/195]  eta: 0:00:36  lr: 0.000250  loss: 4.1815 (4.1519)  time: 0.7968  data: 0.0002  max mem: 6883
Epoch: [30]  [160/195]  eta: 0:00:28  lr: 0.000250  loss: 4.1815 (4.1507)  time: 0.7968  data: 0.0002  max mem: 6883
Epoch: [30]  [170/195]  eta: 0:00:20  lr: 0.000250  loss: 4.1558 (4.1509)  time: 0.7974  data: 0.0003  max mem: 6883
Epoch: [30]  [180/195]  eta: 0:00:12  lr: 0.000250  loss: 4.1812 (4.1499)  time: 0.7974  data: 0.0002  max mem: 6883
Epoch: [30]  [190/195]  eta: 0:00:04  lr: 0.000250  loss: 4.1292 (4.1459)  time: 0.7964  data: 0.0001  max mem: 6883
Epoch: [30]  [194/195]  eta: 0:00:00  lr: 0.000250  loss: 4.1292 (4.1480)  time: 0.7964  data: 0.0001  max mem: 6883
Epoch: [30] Total time: 0:02:36 (0.8018 s / it)
Averaged stats: lr: 0.000250  loss: 4.1292 (4.1517)
Test:  [ 0/53]  eta: 0:00:54  loss: 2.9752 (2.9752)  acc1: 33.3333 (33.3333)  acc5: 62.5000 (62.5000)  time: 1.0308  data: 0.7126  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 2.9752 (2.9968)  acc1: 30.2083 (30.2557)  acc5: 61.9792 (60.8428)  time: 0.3897  data: 0.0650  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 2.9679 (2.9700)  acc1: 30.7292 (30.7788)  acc5: 61.9792 (61.9544)  time: 0.3234  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 2.9755 (2.9756)  acc1: 30.7292 (30.5108)  acc5: 60.9375 (61.7104)  time: 0.3229  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 2.9870 (2.9798)  acc1: 29.6875 (30.2846)  acc5: 60.9375 (61.6108)  time: 0.3232  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 2.9869 (2.9816)  acc1: 29.6875 (30.0245)  acc5: 61.4583 (61.5298)  time: 0.3241  data: 0.0001  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 2.9715 (2.9747)  acc1: 29.6875 (30.0700)  acc5: 61.4583 (61.5300)  time: 0.3095  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3335 s / it)
* Acc@1 30.070 Acc@5 61.530 loss 2.975
Accuracy of the network on the 10000 test images: 30.1%
Max accuracy: 30.52%
Epoch: [31]  [  0/195]  eta: 0:04:58  lr: 0.000238  loss: 4.3662 (4.3662)  time: 1.5309  data: 0.5969  max mem: 6883
Epoch: [31]  [ 10/195]  eta: 0:02:39  lr: 0.000238  loss: 4.1834 (4.1729)  time: 0.8633  data: 0.0544  max mem: 6883
Epoch: [31]  [ 20/195]  eta: 0:02:25  lr: 0.000238  loss: 4.1834 (4.1696)  time: 0.7966  data: 0.0002  max mem: 6883
Epoch: [31]  [ 30/195]  eta: 0:02:15  lr: 0.000238  loss: 4.1826 (4.1653)  time: 0.7969  data: 0.0002  max mem: 6883
Epoch: [31]  [ 40/195]  eta: 0:02:06  lr: 0.000238  loss: 4.1826 (4.1666)  time: 0.7959  data: 0.0002  max mem: 6883
Epoch: [31]  [ 50/195]  eta: 0:01:57  lr: 0.000238  loss: 4.1984 (4.1601)  time: 0.7965  data: 0.0002  max mem: 6883
Epoch: [31]  [ 60/195]  eta: 0:01:49  lr: 0.000238  loss: 4.1470 (4.1566)  time: 0.7972  data: 0.0002  max mem: 6883
Epoch: [31]  [ 70/195]  eta: 0:01:40  lr: 0.000238  loss: 4.0559 (4.1481)  time: 0.7980  data: 0.0002  max mem: 6883
Epoch: [31]  [ 80/195]  eta: 0:01:32  lr: 0.000238  loss: 4.0812 (4.1510)  time: 0.7987  data: 0.0002  max mem: 6883
Epoch: [31]  [ 90/195]  eta: 0:01:24  lr: 0.000238  loss: 4.1558 (4.1474)  time: 0.7972  data: 0.0002  max mem: 6883
Epoch: [31]  [100/195]  eta: 0:01:16  lr: 0.000238  loss: 4.0493 (4.1359)  time: 0.7960  data: 0.0002  max mem: 6883
Epoch: [31]  [110/195]  eta: 0:01:08  lr: 0.000238  loss: 4.0442 (4.1341)  time: 0.7956  data: 0.0002  max mem: 6883
Epoch: [31]  [120/195]  eta: 0:01:00  lr: 0.000238  loss: 4.2241 (4.1389)  time: 0.7964  data: 0.0002  max mem: 6883
Epoch: [31]  [130/195]  eta: 0:00:52  lr: 0.000238  loss: 4.2266 (4.1408)  time: 0.7964  data: 0.0002  max mem: 6883
Epoch: [31]  [140/195]  eta: 0:00:44  lr: 0.000238  loss: 4.1887 (4.1429)  time: 0.7963  data: 0.0003  max mem: 6883
Epoch: [31]  [150/195]  eta: 0:00:36  lr: 0.000238  loss: 4.1887 (4.1447)  time: 0.7964  data: 0.0002  max mem: 6883
Epoch: [31]  [160/195]  eta: 0:00:28  lr: 0.000238  loss: 4.1753 (4.1455)  time: 0.7967  data: 0.0002  max mem: 6883
Epoch: [31]  [170/195]  eta: 0:00:20  lr: 0.000238  loss: 4.0344 (4.1354)  time: 0.7976  data: 0.0002  max mem: 6883
Epoch: [31]  [180/195]  eta: 0:00:12  lr: 0.000238  loss: 4.0504 (4.1370)  time: 0.7980  data: 0.0002  max mem: 6883
Epoch: [31]  [190/195]  eta: 0:00:04  lr: 0.000238  loss: 4.1476 (4.1388)  time: 0.7962  data: 0.0001  max mem: 6883
Epoch: [31]  [194/195]  eta: 0:00:00  lr: 0.000238  loss: 4.2073 (4.1396)  time: 0.7958  data: 0.0001  max mem: 6883
Epoch: [31] Total time: 0:02:36 (0.8011 s / it)
Averaged stats: lr: 0.000238  loss: 4.2073 (4.1377)
Test:  [ 0/53]  eta: 0:00:50  loss: 2.9985 (2.9985)  acc1: 36.9792 (36.9792)  acc5: 64.0625 (64.0625)  time: 0.9616  data: 0.6424  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 2.9985 (2.9892)  acc1: 30.7292 (31.6288)  acc5: 63.0208 (62.3106)  time: 0.3819  data: 0.0586  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 2.9621 (2.9674)  acc1: 30.7292 (32.2173)  acc5: 63.0208 (62.9712)  time: 0.3226  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 2.9621 (2.9685)  acc1: 32.8125 (32.5437)  acc5: 62.5000 (62.8360)  time: 0.3228  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 2.9806 (2.9736)  acc1: 29.6875 (31.7454)  acc5: 62.5000 (62.9192)  time: 0.3226  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 2.9665 (2.9749)  acc1: 31.2500 (31.4951)  acc5: 63.0208 (62.8064)  time: 0.3223  data: 0.0001  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 2.9441 (2.9668)  acc1: 31.2500 (31.5700)  acc5: 64.0625 (62.8400)  time: 0.3079  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3308 s / it)
* Acc@1 31.570 Acc@5 62.840 loss 2.967
Accuracy of the network on the 10000 test images: 31.6%
Max accuracy: 31.57%
Epoch: [32]  [  0/195]  eta: 0:04:34  lr: 0.000226  loss: 4.0491 (4.0491)  time: 1.4086  data: 0.6129  max mem: 6883
Epoch: [32]  [ 10/195]  eta: 0:02:37  lr: 0.000226  loss: 4.2235 (4.1952)  time: 0.8537  data: 0.0560  max mem: 6883
Epoch: [32]  [ 20/195]  eta: 0:02:25  lr: 0.000226  loss: 4.2104 (4.1563)  time: 0.8003  data: 0.0003  max mem: 6883
Epoch: [32]  [ 30/195]  eta: 0:02:15  lr: 0.000226  loss: 4.2234 (4.1939)  time: 0.8005  data: 0.0003  max mem: 6883
Epoch: [32]  [ 40/195]  eta: 0:02:06  lr: 0.000226  loss: 4.2677 (4.1844)  time: 0.7974  data: 0.0002  max mem: 6883
Epoch: [32]  [ 50/195]  eta: 0:01:57  lr: 0.000226  loss: 4.2146 (4.1781)  time: 0.7975  data: 0.0002  max mem: 6883
Epoch: [32]  [ 60/195]  eta: 0:01:49  lr: 0.000226  loss: 4.2146 (4.1887)  time: 0.7975  data: 0.0002  max mem: 6883
Epoch: [32]  [ 70/195]  eta: 0:01:40  lr: 0.000226  loss: 4.2347 (4.1936)  time: 0.7978  data: 0.0003  max mem: 6883
Epoch: [32]  [ 80/195]  eta: 0:01:32  lr: 0.000226  loss: 4.2116 (4.1795)  time: 0.7983  data: 0.0003  max mem: 6883
Epoch: [32]  [ 90/195]  eta: 0:01:24  lr: 0.000226  loss: 4.0850 (4.1739)  time: 0.7974  data: 0.0002  max mem: 6883
Epoch: [32]  [100/195]  eta: 0:01:16  lr: 0.000226  loss: 4.1572 (4.1686)  time: 0.7975  data: 0.0002  max mem: 6883
Epoch: [32]  [110/195]  eta: 0:01:08  lr: 0.000226  loss: 4.1627 (4.1684)  time: 0.7971  data: 0.0003  max mem: 6883
Epoch: [32]  [120/195]  eta: 0:01:00  lr: 0.000226  loss: 4.1627 (4.1649)  time: 0.7966  data: 0.0003  max mem: 6883
Epoch: [32]  [130/195]  eta: 0:00:52  lr: 0.000226  loss: 4.1920 (4.1654)  time: 0.7962  data: 0.0003  max mem: 6883
Epoch: [32]  [140/195]  eta: 0:00:44  lr: 0.000226  loss: 4.2029 (4.1643)  time: 0.7970  data: 0.0002  max mem: 6883
Epoch: [32]  [150/195]  eta: 0:00:36  lr: 0.000226  loss: 4.1838 (4.1621)  time: 0.7973  data: 0.0002  max mem: 6883
Epoch: [32]  [160/195]  eta: 0:00:28  lr: 0.000226  loss: 4.1146 (4.1589)  time: 0.7979  data: 0.0002  max mem: 6883
Epoch: [32]  [170/195]  eta: 0:00:20  lr: 0.000226  loss: 4.1146 (4.1537)  time: 0.7976  data: 0.0002  max mem: 6883
Epoch: [32]  [180/195]  eta: 0:00:12  lr: 0.000226  loss: 4.1683 (4.1598)  time: 0.7981  data: 0.0002  max mem: 6883
Epoch: [32]  [190/195]  eta: 0:00:04  lr: 0.000226  loss: 4.1683 (4.1549)  time: 0.7978  data: 0.0002  max mem: 6883
Epoch: [32]  [194/195]  eta: 0:00:00  lr: 0.000226  loss: 4.0870 (4.1503)  time: 0.7978  data: 0.0001  max mem: 6883
Epoch: [32] Total time: 0:02:36 (0.8014 s / it)
Averaged stats: lr: 0.000226  loss: 4.0870 (4.1459)
Test:  [ 0/53]  eta: 0:00:42  loss: 2.9775 (2.9775)  acc1: 33.8542 (33.8542)  acc5: 61.4583 (61.4583)  time: 0.8071  data: 0.4846  max mem: 6883
Test:  [10/53]  eta: 0:00:15  loss: 3.0064 (2.9913)  acc1: 30.7292 (30.7292)  acc5: 61.9792 (61.9792)  time: 0.3692  data: 0.0444  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 2.9669 (2.9701)  acc1: 30.7292 (30.9772)  acc5: 62.5000 (62.4504)  time: 0.3234  data: 0.0004  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 2.9645 (2.9724)  acc1: 31.2500 (31.1324)  acc5: 61.9792 (62.0464)  time: 0.3225  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 2.9858 (2.9796)  acc1: 30.2083 (30.6402)  acc5: 61.4583 (61.8648)  time: 0.3226  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:00  loss: 2.9858 (2.9837)  acc1: 28.1250 (30.2390)  acc5: 61.9792 (61.6728)  time: 0.3225  data: 0.0002  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 2.9655 (2.9766)  acc1: 29.1667 (30.2900)  acc5: 61.9792 (61.6900)  time: 0.3080  data: 0.0002  max mem: 6883
Test: Total time: 0:00:17 (0.3286 s / it)
* Acc@1 30.290 Acc@5 61.690 loss 2.977
Accuracy of the network on the 10000 test images: 30.3%
Max accuracy: 31.57%
Epoch: [33]  [  0/195]  eta: 0:04:29  lr: 0.000215  loss: 4.4219 (4.4219)  time: 1.3806  data: 0.5842  max mem: 6883
Epoch: [33]  [ 10/195]  eta: 0:02:37  lr: 0.000215  loss: 4.1309 (4.1436)  time: 0.8518  data: 0.0533  max mem: 6883
Epoch: [33]  [ 20/195]  eta: 0:02:24  lr: 0.000215  loss: 4.1268 (4.1196)  time: 0.7978  data: 0.0003  max mem: 6883
Epoch: [33]  [ 30/195]  eta: 0:02:14  lr: 0.000215  loss: 4.0076 (4.1036)  time: 0.7980  data: 0.0003  max mem: 6883
Epoch: [33]  [ 40/195]  eta: 0:02:05  lr: 0.000215  loss: 4.0517 (4.0954)  time: 0.7980  data: 0.0002  max mem: 6883
Epoch: [33]  [ 50/195]  eta: 0:01:57  lr: 0.000215  loss: 4.0737 (4.0938)  time: 0.7969  data: 0.0002  max mem: 6883
Epoch: [33]  [ 60/195]  eta: 0:01:48  lr: 0.000215  loss: 4.1494 (4.1013)  time: 0.7970  data: 0.0002  max mem: 6883
Epoch: [33]  [ 70/195]  eta: 0:01:40  lr: 0.000215  loss: 4.1591 (4.1036)  time: 0.7980  data: 0.0002  max mem: 6883
Epoch: [33]  [ 80/195]  eta: 0:01:32  lr: 0.000215  loss: 4.1441 (4.1110)  time: 0.7976  data: 0.0002  max mem: 6883
Epoch: [33]  [ 90/195]  eta: 0:01:24  lr: 0.000215  loss: 4.1733 (4.1116)  time: 0.7976  data: 0.0002  max mem: 6883
Epoch: [33]  [100/195]  eta: 0:01:16  lr: 0.000215  loss: 4.1796 (4.1147)  time: 0.7977  data: 0.0003  max mem: 6883
Epoch: [33]  [110/195]  eta: 0:01:08  lr: 0.000215  loss: 4.1872 (4.1214)  time: 0.7982  data: 0.0003  max mem: 6883
Epoch: [33]  [120/195]  eta: 0:01:00  lr: 0.000215  loss: 4.1540 (4.1201)  time: 0.7990  data: 0.0003  max mem: 6883
Epoch: [33]  [130/195]  eta: 0:00:52  lr: 0.000215  loss: 4.1033 (4.1217)  time: 0.7985  data: 0.0002  max mem: 6883
Epoch: [33]  [140/195]  eta: 0:00:44  lr: 0.000215  loss: 4.1482 (4.1191)  time: 0.7984  data: 0.0002  max mem: 6883
Epoch: [33]  [150/195]  eta: 0:00:36  lr: 0.000215  loss: 4.1882 (4.1194)  time: 0.7970  data: 0.0002  max mem: 6883
Epoch: [33]  [160/195]  eta: 0:00:28  lr: 0.000215  loss: 4.1724 (4.1211)  time: 0.7976  data: 0.0002  max mem: 6883
Epoch: [33]  [170/195]  eta: 0:00:20  lr: 0.000215  loss: 4.0790 (4.1177)  time: 0.7971  data: 0.0002  max mem: 6883
Epoch: [33]  [180/195]  eta: 0:00:12  lr: 0.000215  loss: 4.0902 (4.1215)  time: 0.7957  data: 0.0002  max mem: 6883
Epoch: [33]  [190/195]  eta: 0:00:04  lr: 0.000215  loss: 4.2239 (4.1251)  time: 0.7951  data: 0.0001  max mem: 6883
Epoch: [33]  [194/195]  eta: 0:00:00  lr: 0.000215  loss: 4.1879 (4.1222)  time: 0.7957  data: 0.0001  max mem: 6883
Epoch: [33] Total time: 0:02:36 (0.8010 s / it)
Averaged stats: lr: 0.000215  loss: 4.1879 (4.1295)
Test:  [ 0/53]  eta: 0:00:53  loss: 2.9622 (2.9622)  acc1: 32.8125 (32.8125)  acc5: 63.0208 (63.0208)  time: 1.0055  data: 0.6871  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 3.0142 (2.9976)  acc1: 29.1667 (29.3087)  acc5: 60.4167 (61.8845)  time: 0.3858  data: 0.0628  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 2.9711 (2.9702)  acc1: 31.2500 (30.3819)  acc5: 60.9375 (62.3264)  time: 0.3224  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 2.9594 (2.9711)  acc1: 31.2500 (30.5276)  acc5: 61.4583 (61.9792)  time: 0.3221  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 2.9663 (2.9739)  acc1: 28.6458 (30.1575)  acc5: 61.4583 (61.9538)  time: 0.3224  data: 0.0003  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 2.9663 (2.9761)  acc1: 28.6458 (29.9734)  acc5: 59.8958 (61.6626)  time: 0.3231  data: 0.0002  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 2.9539 (2.9680)  acc1: 29.1667 (30.0400)  acc5: 60.9375 (61.6600)  time: 0.3086  data: 0.0002  max mem: 6883
Test: Total time: 0:00:17 (0.3316 s / it)
* Acc@1 30.040 Acc@5 61.660 loss 2.968
Accuracy of the network on the 10000 test images: 30.0%
Max accuracy: 31.57%
Epoch: [34]  [  0/195]  eta: 0:04:50  lr: 0.000204  loss: 3.8470 (3.8470)  time: 1.4891  data: 0.6494  max mem: 6883
Epoch: [34]  [ 10/195]  eta: 0:02:39  lr: 0.000204  loss: 4.1679 (4.1232)  time: 0.8605  data: 0.0592  max mem: 6883
Epoch: [34]  [ 20/195]  eta: 0:02:25  lr: 0.000204  loss: 4.1451 (4.1164)  time: 0.7972  data: 0.0003  max mem: 6883
Epoch: [34]  [ 30/195]  eta: 0:02:15  lr: 0.000204  loss: 4.1479 (4.1285)  time: 0.7975  data: 0.0003  max mem: 6883
Epoch: [34]  [ 40/195]  eta: 0:02:06  lr: 0.000204  loss: 4.1731 (4.1077)  time: 0.7973  data: 0.0003  max mem: 6883
Epoch: [34]  [ 50/195]  eta: 0:01:57  lr: 0.000204  loss: 3.9679 (4.0919)  time: 0.7977  data: 0.0003  max mem: 6883
Epoch: [34]  [ 60/195]  eta: 0:01:49  lr: 0.000204  loss: 4.0199 (4.0848)  time: 0.7974  data: 0.0002  max mem: 6883
Epoch: [34]  [ 70/195]  eta: 0:01:40  lr: 0.000204  loss: 4.1522 (4.1016)  time: 0.7976  data: 0.0002  max mem: 6883
Epoch: [34]  [ 80/195]  eta: 0:01:32  lr: 0.000204  loss: 4.1801 (4.1146)  time: 0.7982  data: 0.0002  max mem: 6883
Epoch: [34]  [ 90/195]  eta: 0:01:24  lr: 0.000204  loss: 4.1559 (4.1198)  time: 0.7977  data: 0.0003  max mem: 6883
Epoch: [34]  [100/195]  eta: 0:01:16  lr: 0.000204  loss: 4.2163 (4.1262)  time: 0.7975  data: 0.0003  max mem: 6883
Epoch: [34]  [110/195]  eta: 0:01:08  lr: 0.000204  loss: 4.1950 (4.1245)  time: 0.7968  data: 0.0003  max mem: 6883
Epoch: [34]  [120/195]  eta: 0:01:00  lr: 0.000204  loss: 4.0339 (4.1087)  time: 0.7972  data: 0.0003  max mem: 6883
Epoch: [34]  [130/195]  eta: 0:00:52  lr: 0.000204  loss: 4.0358 (4.1073)  time: 0.7980  data: 0.0003  max mem: 6883
Epoch: [34]  [140/195]  eta: 0:00:44  lr: 0.000204  loss: 4.0393 (4.1015)  time: 0.7976  data: 0.0003  max mem: 6883
Epoch: [34]  [150/195]  eta: 0:00:36  lr: 0.000204  loss: 4.1633 (4.1095)  time: 0.7967  data: 0.0003  max mem: 6883
Epoch: [34]  [160/195]  eta: 0:00:28  lr: 0.000204  loss: 4.2096 (4.1119)  time: 0.7968  data: 0.0002  max mem: 6883
Epoch: [34]  [170/195]  eta: 0:00:20  lr: 0.000204  loss: 4.2096 (4.1161)  time: 0.7965  data: 0.0002  max mem: 6883
Epoch: [34]  [180/195]  eta: 0:00:12  lr: 0.000204  loss: 4.2132 (4.1168)  time: 0.7970  data: 0.0002  max mem: 6883
Epoch: [34]  [190/195]  eta: 0:00:04  lr: 0.000204  loss: 4.1038 (4.1147)  time: 0.7962  data: 0.0001  max mem: 6883
Epoch: [34]  [194/195]  eta: 0:00:00  lr: 0.000204  loss: 4.1038 (4.1146)  time: 0.7965  data: 0.0001  max mem: 6883
Epoch: [34] Total time: 0:02:36 (0.8015 s / it)
Averaged stats: lr: 0.000204  loss: 4.1038 (4.1196)
Test:  [ 0/53]  eta: 0:00:49  loss: 2.9457 (2.9457)  acc1: 34.3750 (34.3750)  acc5: 64.0625 (64.0625)  time: 0.9249  data: 0.6014  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 2.9674 (2.9582)  acc1: 30.7292 (30.5398)  acc5: 61.4583 (62.4527)  time: 0.3784  data: 0.0549  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 2.9248 (2.9372)  acc1: 30.7292 (30.8532)  acc5: 61.4583 (62.6736)  time: 0.3224  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 2.9300 (2.9415)  acc1: 31.2500 (31.1492)  acc5: 60.4167 (61.7776)  time: 0.3222  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 2.9383 (2.9467)  acc1: 29.6875 (30.5894)  acc5: 60.4167 (61.8775)  time: 0.3227  data: 0.0003  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 2.9383 (2.9493)  acc1: 29.1667 (30.3513)  acc5: 60.9375 (61.6932)  time: 0.3231  data: 0.0002  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 2.9254 (2.9407)  acc1: 29.6875 (30.4200)  acc5: 62.5000 (61.6800)  time: 0.3087  data: 0.0002  max mem: 6883
Test: Total time: 0:00:17 (0.3302 s / it)
* Acc@1 30.420 Acc@5 61.680 loss 2.941
Accuracy of the network on the 10000 test images: 30.4%
Max accuracy: 31.57%
Epoch: [35]  [  0/195]  eta: 0:04:43  lr: 0.000193  loss: 4.1122 (4.1122)  time: 1.4554  data: 0.6569  max mem: 6883
Epoch: [35]  [ 10/195]  eta: 0:02:38  lr: 0.000193  loss: 4.1160 (4.1172)  time: 0.8587  data: 0.0599  max mem: 6883
Epoch: [35]  [ 20/195]  eta: 0:02:25  lr: 0.000193  loss: 4.2029 (4.1333)  time: 0.7983  data: 0.0003  max mem: 6883
Epoch: [35]  [ 30/195]  eta: 0:02:15  lr: 0.000193  loss: 4.0485 (4.0793)  time: 0.7988  data: 0.0003  max mem: 6883
Epoch: [35]  [ 40/195]  eta: 0:02:06  lr: 0.000193  loss: 3.9967 (4.0832)  time: 0.7986  data: 0.0003  max mem: 6883
Epoch: [35]  [ 50/195]  eta: 0:01:57  lr: 0.000193  loss: 4.0850 (4.0903)  time: 0.8024  data: 0.0003  max mem: 6883
Epoch: [35]  [ 60/195]  eta: 0:01:49  lr: 0.000193  loss: 4.1682 (4.1008)  time: 0.8011  data: 0.0002  max mem: 6883
Epoch: [35]  [ 70/195]  eta: 0:01:41  lr: 0.000193  loss: 4.1629 (4.0995)  time: 0.7964  data: 0.0002  max mem: 6883
Epoch: [35]  [ 80/195]  eta: 0:01:32  lr: 0.000193  loss: 4.1051 (4.0876)  time: 0.7970  data: 0.0002  max mem: 6883
Epoch: [35]  [ 90/195]  eta: 0:01:24  lr: 0.000193  loss: 4.0126 (4.0831)  time: 0.7969  data: 0.0002  max mem: 6883
Epoch: [35]  [100/195]  eta: 0:01:16  lr: 0.000193  loss: 4.0520 (4.0860)  time: 0.7972  data: 0.0002  max mem: 6883
Epoch: [35]  [110/195]  eta: 0:01:08  lr: 0.000193  loss: 4.1324 (4.0913)  time: 0.7970  data: 0.0003  max mem: 6883
Epoch: [35]  [120/195]  eta: 0:01:00  lr: 0.000193  loss: 4.1315 (4.0900)  time: 0.7965  data: 0.0002  max mem: 6883
Epoch: [35]  [130/195]  eta: 0:00:52  lr: 0.000193  loss: 3.9555 (4.0840)  time: 0.7965  data: 0.0002  max mem: 6883
Epoch: [35]  [140/195]  eta: 0:00:44  lr: 0.000193  loss: 4.0494 (4.0902)  time: 0.7966  data: 0.0002  max mem: 6883
Epoch: [35]  [150/195]  eta: 0:00:36  lr: 0.000193  loss: 4.1069 (4.0885)  time: 0.7965  data: 0.0002  max mem: 6883
Epoch: [35]  [160/195]  eta: 0:00:28  lr: 0.000193  loss: 4.1753 (4.0911)  time: 0.7974  data: 0.0002  max mem: 6883
Epoch: [35]  [170/195]  eta: 0:00:20  lr: 0.000193  loss: 4.1749 (4.0905)  time: 0.7960  data: 0.0002  max mem: 6883
Epoch: [35]  [180/195]  eta: 0:00:12  lr: 0.000193  loss: 4.0291 (4.0838)  time: 0.8002  data: 0.0002  max mem: 6883
Epoch: [35]  [190/195]  eta: 0:00:04  lr: 0.000193  loss: 4.0291 (4.0890)  time: 0.8008  data: 0.0001  max mem: 6883
Epoch: [35]  [194/195]  eta: 0:00:00  lr: 0.000193  loss: 4.0517 (4.0891)  time: 0.8006  data: 0.0001  max mem: 6883
Epoch: [35] Total time: 0:02:36 (0.8019 s / it)
Averaged stats: lr: 0.000193  loss: 4.0517 (4.1042)
Test:  [ 0/53]  eta: 0:00:58  loss: 2.8898 (2.8898)  acc1: 35.4167 (35.4167)  acc5: 61.4583 (61.4583)  time: 1.1120  data: 0.7942  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 2.9055 (2.9073)  acc1: 31.2500 (31.5341)  acc5: 62.5000 (62.8788)  time: 0.3947  data: 0.0725  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 2.8725 (2.8827)  acc1: 32.2917 (32.3909)  acc5: 62.5000 (63.3185)  time: 0.3221  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:08  loss: 2.8792 (2.8864)  acc1: 32.8125 (32.6109)  acc5: 61.9792 (62.9200)  time: 0.3225  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 2.8934 (2.8913)  acc1: 30.7292 (32.2282)  acc5: 61.4583 (62.7287)  time: 0.3226  data: 0.0003  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 2.8869 (2.8950)  acc1: 30.7292 (31.9240)  acc5: 61.4583 (62.5613)  time: 0.3237  data: 0.0002  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 2.8742 (2.8878)  acc1: 30.7292 (32.0000)  acc5: 62.5000 (62.6000)  time: 0.3091  data: 0.0002  max mem: 6883
Test: Total time: 0:00:17 (0.3339 s / it)
* Acc@1 32.000 Acc@5 62.600 loss 2.888
Accuracy of the network on the 10000 test images: 32.0%
Max accuracy: 32.00%
Epoch: [36]  [  0/195]  eta: 0:04:42  lr: 0.000182  loss: 4.0686 (4.0686)  time: 1.4500  data: 0.6529  max mem: 6883
Epoch: [36]  [ 10/195]  eta: 0:02:38  lr: 0.000182  loss: 4.1710 (4.1702)  time: 0.8567  data: 0.0595  max mem: 6883
Epoch: [36]  [ 20/195]  eta: 0:02:24  lr: 0.000182  loss: 4.1566 (4.1205)  time: 0.7970  data: 0.0002  max mem: 6883
Epoch: [36]  [ 30/195]  eta: 0:02:15  lr: 0.000182  loss: 4.1280 (4.1237)  time: 0.7977  data: 0.0002  max mem: 6883
Epoch: [36]  [ 40/195]  eta: 0:02:06  lr: 0.000182  loss: 4.2624 (4.1305)  time: 0.7970  data: 0.0003  max mem: 6883
Epoch: [36]  [ 50/195]  eta: 0:01:57  lr: 0.000182  loss: 4.1486 (4.1301)  time: 0.7977  data: 0.0002  max mem: 6883
Epoch: [36]  [ 60/195]  eta: 0:01:49  lr: 0.000182  loss: 4.0413 (4.1018)  time: 0.7975  data: 0.0002  max mem: 6883
Epoch: [36]  [ 70/195]  eta: 0:01:40  lr: 0.000182  loss: 4.0723 (4.1080)  time: 0.7965  data: 0.0002  max mem: 6883
Epoch: [36]  [ 80/195]  eta: 0:01:32  lr: 0.000182  loss: 4.2059 (4.1213)  time: 0.7975  data: 0.0003  max mem: 6883
Epoch: [36]  [ 90/195]  eta: 0:01:24  lr: 0.000182  loss: 4.0867 (4.1094)  time: 0.7978  data: 0.0003  max mem: 6883
Epoch: [36]  [100/195]  eta: 0:01:16  lr: 0.000182  loss: 4.0698 (4.1131)  time: 0.7980  data: 0.0003  max mem: 6883
Epoch: [36]  [110/195]  eta: 0:01:08  lr: 0.000182  loss: 4.1167 (4.1116)  time: 0.7979  data: 0.0003  max mem: 6883
Epoch: [36]  [120/195]  eta: 0:01:00  lr: 0.000182  loss: 4.1166 (4.1105)  time: 0.7975  data: 0.0003  max mem: 6883
Epoch: [36]  [130/195]  eta: 0:00:52  lr: 0.000182  loss: 4.2203 (4.1188)  time: 0.7965  data: 0.0002  max mem: 6883
Epoch: [36]  [140/195]  eta: 0:00:44  lr: 0.000182  loss: 4.2265 (4.1278)  time: 0.7969  data: 0.0002  max mem: 6883
Epoch: [36]  [150/195]  eta: 0:00:36  lr: 0.000182  loss: 4.2209 (4.1349)  time: 0.7974  data: 0.0002  max mem: 6883
Epoch: [36]  [160/195]  eta: 0:00:28  lr: 0.000182  loss: 4.1865 (4.1280)  time: 0.7986  data: 0.0003  max mem: 6883
Epoch: [36]  [170/195]  eta: 0:00:20  lr: 0.000182  loss: 3.9923 (4.1230)  time: 0.7981  data: 0.0003  max mem: 6883
Epoch: [36]  [180/195]  eta: 0:00:12  lr: 0.000182  loss: 4.1481 (4.1250)  time: 0.7983  data: 0.0003  max mem: 6883
Epoch: [36]  [190/195]  eta: 0:00:04  lr: 0.000182  loss: 4.1131 (4.1189)  time: 0.7974  data: 0.0002  max mem: 6883
Epoch: [36]  [194/195]  eta: 0:00:00  lr: 0.000182  loss: 4.0667 (4.1189)  time: 0.7973  data: 0.0002  max mem: 6883
Epoch: [36] Total time: 0:02:36 (0.8014 s / it)
Averaged stats: lr: 0.000182  loss: 4.0667 (4.1162)
Test:  [ 0/53]  eta: 0:00:42  loss: 2.9097 (2.9097)  acc1: 35.9375 (35.9375)  acc5: 62.5000 (62.5000)  time: 0.8104  data: 0.4877  max mem: 6883
Test:  [10/53]  eta: 0:00:15  loss: 2.9349 (2.9293)  acc1: 31.7708 (31.9602)  acc5: 61.4583 (62.4527)  time: 0.3691  data: 0.0446  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 2.9011 (2.9035)  acc1: 31.7708 (32.4157)  acc5: 61.4583 (62.8968)  time: 0.3233  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 2.9011 (2.9054)  acc1: 32.8125 (32.2917)  acc5: 62.5000 (62.4496)  time: 0.3231  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 2.9264 (2.9119)  acc1: 30.7292 (31.7454)  acc5: 62.5000 (62.4746)  time: 0.3233  data: 0.0003  max mem: 6883
Test:  [50/53]  eta: 0:00:00  loss: 2.9170 (2.9124)  acc1: 30.7292 (31.5768)  acc5: 61.9792 (62.2855)  time: 0.3235  data: 0.0002  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 2.8774 (2.9052)  acc1: 30.7292 (31.6300)  acc5: 62.5000 (62.2600)  time: 0.3090  data: 0.0002  max mem: 6883
Test: Total time: 0:00:17 (0.3289 s / it)
* Acc@1 31.630 Acc@5 62.260 loss 2.905
Accuracy of the network on the 10000 test images: 31.6%
Max accuracy: 32.00%
Epoch: [37]  [  0/195]  eta: 0:04:46  lr: 0.000173  loss: 4.1306 (4.1306)  time: 1.4696  data: 0.6719  max mem: 6883
Epoch: [37]  [ 10/195]  eta: 0:02:38  lr: 0.000173  loss: 4.1801 (4.1496)  time: 0.8594  data: 0.0613  max mem: 6883
Epoch: [37]  [ 20/195]  eta: 0:02:25  lr: 0.000173  loss: 4.0986 (4.1133)  time: 0.7967  data: 0.0002  max mem: 6883
Epoch: [37]  [ 30/195]  eta: 0:02:15  lr: 0.000173  loss: 4.1146 (4.1340)  time: 0.7960  data: 0.0002  max mem: 6883
Epoch: [37]  [ 40/195]  eta: 0:02:05  lr: 0.000173  loss: 4.1722 (4.1177)  time: 0.7959  data: 0.0002  max mem: 6883
Epoch: [37]  [ 50/195]  eta: 0:01:57  lr: 0.000173  loss: 4.1944 (4.1461)  time: 0.7969  data: 0.0002  max mem: 6883
Epoch: [37]  [ 60/195]  eta: 0:01:48  lr: 0.000173  loss: 4.2145 (4.1411)  time: 0.7964  data: 0.0002  max mem: 6883
Epoch: [37]  [ 70/195]  eta: 0:01:40  lr: 0.000173  loss: 4.1708 (4.1246)  time: 0.7956  data: 0.0002  max mem: 6883
Epoch: [37]  [ 80/195]  eta: 0:01:32  lr: 0.000173  loss: 4.0491 (4.1247)  time: 0.7968  data: 0.0002  max mem: 6883
Epoch: [37]  [ 90/195]  eta: 0:01:24  lr: 0.000173  loss: 4.2039 (4.1295)  time: 0.7968  data: 0.0002  max mem: 6883
Epoch: [37]  [100/195]  eta: 0:01:16  lr: 0.000173  loss: 4.1216 (4.1246)  time: 0.7981  data: 0.0003  max mem: 6883
Epoch: [37]  [110/195]  eta: 0:01:08  lr: 0.000173  loss: 4.0712 (4.1195)  time: 0.7977  data: 0.0003  max mem: 6883
Epoch: [37]  [120/195]  eta: 0:01:00  lr: 0.000173  loss: 4.0671 (4.1180)  time: 0.7970  data: 0.0002  max mem: 6883
Epoch: [37]  [130/195]  eta: 0:00:52  lr: 0.000173  loss: 4.1239 (4.1251)  time: 0.7973  data: 0.0002  max mem: 6883
Epoch: [37]  [140/195]  eta: 0:00:44  lr: 0.000173  loss: 4.1239 (4.1227)  time: 0.7972  data: 0.0002  max mem: 6883
Epoch: [37]  [150/195]  eta: 0:00:36  lr: 0.000173  loss: 4.0486 (4.1187)  time: 0.7971  data: 0.0002  max mem: 6883
Epoch: [37]  [160/195]  eta: 0:00:28  lr: 0.000173  loss: 4.0514 (4.1165)  time: 0.7958  data: 0.0002  max mem: 6883
Epoch: [37]  [170/195]  eta: 0:00:20  lr: 0.000173  loss: 4.1435 (4.1141)  time: 0.7947  data: 0.0002  max mem: 6883
Epoch: [37]  [180/195]  eta: 0:00:12  lr: 0.000173  loss: 4.1542 (4.1152)  time: 0.7961  data: 0.0002  max mem: 6883
Epoch: [37]  [190/195]  eta: 0:00:04  lr: 0.000173  loss: 4.0891 (4.1115)  time: 0.7965  data: 0.0002  max mem: 6883
Epoch: [37]  [194/195]  eta: 0:00:00  lr: 0.000173  loss: 4.0891 (4.1126)  time: 0.7965  data: 0.0001  max mem: 6883
Epoch: [37] Total time: 0:02:36 (0.8007 s / it)
Averaged stats: lr: 0.000173  loss: 4.0891 (4.0965)
Test:  [ 0/53]  eta: 0:00:45  loss: 2.8826 (2.8826)  acc1: 33.8542 (33.8542)  acc5: 64.0625 (64.0625)  time: 0.8543  data: 0.5368  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 2.9309 (2.9045)  acc1: 31.7708 (32.0076)  acc5: 64.0625 (62.6420)  time: 0.3725  data: 0.0491  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 2.8784 (2.8771)  acc1: 33.3333 (32.8621)  acc5: 64.0625 (63.3929)  time: 0.3229  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 2.8784 (2.8808)  acc1: 34.3750 (32.7621)  acc5: 62.5000 (63.0376)  time: 0.3226  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 2.8926 (2.8885)  acc1: 31.7708 (32.3044)  acc5: 61.9792 (62.8811)  time: 0.3232  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 2.8754 (2.8924)  acc1: 30.7292 (32.0364)  acc5: 62.5000 (62.8574)  time: 0.3240  data: 0.0002  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 2.8594 (2.8854)  acc1: 31.7708 (32.1000)  acc5: 63.5417 (62.8800)  time: 0.3093  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3295 s / it)
* Acc@1 32.100 Acc@5 62.880 loss 2.885
Accuracy of the network on the 10000 test images: 32.1%
Max accuracy: 32.10%
Epoch: [38]  [  0/195]  eta: 0:04:32  lr: 0.000163  loss: 4.0995 (4.0995)  time: 1.3992  data: 0.5769  max mem: 6883
Epoch: [38]  [ 10/195]  eta: 0:02:38  lr: 0.000163  loss: 4.1462 (4.1634)  time: 0.8559  data: 0.0527  max mem: 6883
Epoch: [38]  [ 20/195]  eta: 0:02:24  lr: 0.000163  loss: 4.1553 (4.1130)  time: 0.7992  data: 0.0003  max mem: 6883
Epoch: [38]  [ 30/195]  eta: 0:02:15  lr: 0.000163  loss: 4.0392 (4.0759)  time: 0.7980  data: 0.0003  max mem: 6883
Epoch: [38]  [ 40/195]  eta: 0:02:05  lr: 0.000163  loss: 3.9579 (4.0600)  time: 0.7972  data: 0.0002  max mem: 6883
Epoch: [38]  [ 50/195]  eta: 0:01:57  lr: 0.000163  loss: 4.0858 (4.0789)  time: 0.7967  data: 0.0002  max mem: 6883
Epoch: [38]  [ 60/195]  eta: 0:01:49  lr: 0.000163  loss: 4.2354 (4.0989)  time: 0.7972  data: 0.0002  max mem: 6883
Epoch: [38]  [ 70/195]  eta: 0:01:40  lr: 0.000163  loss: 4.2143 (4.1002)  time: 0.7965  data: 0.0002  max mem: 6883
Epoch: [38]  [ 80/195]  eta: 0:01:32  lr: 0.000163  loss: 4.0883 (4.0932)  time: 0.7968  data: 0.0003  max mem: 6883
Epoch: [38]  [ 90/195]  eta: 0:01:24  lr: 0.000163  loss: 4.0712 (4.0956)  time: 0.7971  data: 0.0003  max mem: 6883
Epoch: [38]  [100/195]  eta: 0:01:16  lr: 0.000163  loss: 4.0956 (4.0955)  time: 0.7972  data: 0.0002  max mem: 6883
Epoch: [38]  [110/195]  eta: 0:01:08  lr: 0.000163  loss: 4.0459 (4.0958)  time: 0.7982  data: 0.0003  max mem: 6883
Epoch: [38]  [120/195]  eta: 0:01:00  lr: 0.000163  loss: 4.0701 (4.0985)  time: 0.7983  data: 0.0003  max mem: 6883
Epoch: [38]  [130/195]  eta: 0:00:52  lr: 0.000163  loss: 4.1050 (4.0972)  time: 0.7973  data: 0.0003  max mem: 6883
Epoch: [38]  [140/195]  eta: 0:00:44  lr: 0.000163  loss: 4.1739 (4.1026)  time: 0.7965  data: 0.0002  max mem: 6883
Epoch: [38]  [150/195]  eta: 0:00:36  lr: 0.000163  loss: 4.1616 (4.0956)  time: 0.7964  data: 0.0002  max mem: 6883
Epoch: [38]  [160/195]  eta: 0:00:28  lr: 0.000163  loss: 3.9607 (4.0870)  time: 0.7966  data: 0.0002  max mem: 6883
Epoch: [38]  [170/195]  eta: 0:00:20  lr: 0.000163  loss: 3.9751 (4.0877)  time: 0.7968  data: 0.0002  max mem: 6883
Epoch: [38]  [180/195]  eta: 0:00:12  lr: 0.000163  loss: 4.1126 (4.0892)  time: 0.7989  data: 0.0002  max mem: 6883
Epoch: [38]  [190/195]  eta: 0:00:04  lr: 0.000163  loss: 4.1444 (4.0873)  time: 0.7992  data: 0.0002  max mem: 6883
Epoch: [38]  [194/195]  eta: 0:00:00  lr: 0.000163  loss: 4.1444 (4.0867)  time: 0.7981  data: 0.0002  max mem: 6883
Epoch: [38] Total time: 0:02:36 (0.8012 s / it)
Averaged stats: lr: 0.000163  loss: 4.1444 (4.1005)
Test:  [ 0/53]  eta: 0:00:42  loss: 2.8931 (2.8931)  acc1: 35.9375 (35.9375)  acc5: 65.1042 (65.1042)  time: 0.7929  data: 0.4731  max mem: 6883
Test:  [10/53]  eta: 0:00:15  loss: 2.9069 (2.8926)  acc1: 29.6875 (31.3447)  acc5: 64.5833 (63.3049)  time: 0.3670  data: 0.0433  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 2.8666 (2.8670)  acc1: 32.2917 (32.1925)  acc5: 62.5000 (63.6657)  time: 0.3231  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 2.8711 (2.8748)  acc1: 32.2917 (32.0733)  acc5: 62.5000 (63.0376)  time: 0.3231  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 2.8838 (2.8793)  acc1: 31.2500 (31.8344)  acc5: 62.5000 (63.0844)  time: 0.3235  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:00  loss: 2.8773 (2.8820)  acc1: 31.2500 (31.7402)  acc5: 63.0208 (62.9596)  time: 0.3240  data: 0.0002  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 2.8605 (2.8735)  acc1: 31.2500 (31.7800)  acc5: 64.5833 (62.9800)  time: 0.3094  data: 0.0002  max mem: 6883
Test: Total time: 0:00:17 (0.3284 s / it)
* Acc@1 31.780 Acc@5 62.980 loss 2.873
Accuracy of the network on the 10000 test images: 31.8%
Max accuracy: 32.10%
Epoch: [39]  [  0/195]  eta: 0:04:19  lr: 0.000154  loss: 3.9241 (3.9241)  time: 1.3284  data: 0.5188  max mem: 6883
Epoch: [39]  [ 10/195]  eta: 0:02:36  lr: 0.000154  loss: 3.9349 (4.0001)  time: 0.8467  data: 0.0474  max mem: 6883
Epoch: [39]  [ 20/195]  eta: 0:02:24  lr: 0.000154  loss: 4.1260 (4.0787)  time: 0.7977  data: 0.0002  max mem: 6883
Epoch: [39]  [ 30/195]  eta: 0:02:14  lr: 0.000154  loss: 4.2503 (4.1174)  time: 0.7983  data: 0.0002  max mem: 6883
Epoch: [39]  [ 40/195]  eta: 0:02:05  lr: 0.000154  loss: 4.2074 (4.1091)  time: 0.7978  data: 0.0002  max mem: 6883
Epoch: [39]  [ 50/195]  eta: 0:01:57  lr: 0.000154  loss: 4.0882 (4.1097)  time: 0.7964  data: 0.0002  max mem: 6883
Epoch: [39]  [ 60/195]  eta: 0:01:48  lr: 0.000154  loss: 4.1853 (4.1199)  time: 0.7969  data: 0.0002  max mem: 6883
Epoch: [39]  [ 70/195]  eta: 0:01:40  lr: 0.000154  loss: 4.1892 (4.1122)  time: 0.7977  data: 0.0003  max mem: 6883
Epoch: [39]  [ 80/195]  eta: 0:01:32  lr: 0.000154  loss: 4.1378 (4.1098)  time: 0.7977  data: 0.0002  max mem: 6883
Epoch: [39]  [ 90/195]  eta: 0:01:24  lr: 0.000154  loss: 4.1213 (4.1097)  time: 0.7976  data: 0.0002  max mem: 6883
Epoch: [39]  [100/195]  eta: 0:01:16  lr: 0.000154  loss: 4.1619 (4.1169)  time: 0.7982  data: 0.0003  max mem: 6883
Epoch: [39]  [110/195]  eta: 0:01:08  lr: 0.000154  loss: 4.1619 (4.1134)  time: 0.7977  data: 0.0003  max mem: 6883
Epoch: [39]  [120/195]  eta: 0:01:00  lr: 0.000154  loss: 3.9817 (4.0931)  time: 0.7982  data: 0.0002  max mem: 6883
Epoch: [39]  [130/195]  eta: 0:00:52  lr: 0.000154  loss: 3.9868 (4.0928)  time: 0.7981  data: 0.0003  max mem: 6883
Epoch: [39]  [140/195]  eta: 0:00:44  lr: 0.000154  loss: 4.1014 (4.0914)  time: 0.7982  data: 0.0003  max mem: 6883
Epoch: [39]  [150/195]  eta: 0:00:36  lr: 0.000154  loss: 4.0998 (4.0925)  time: 0.7978  data: 0.0003  max mem: 6883
Epoch: [39]  [160/195]  eta: 0:00:28  lr: 0.000154  loss: 4.0583 (4.0935)  time: 0.7970  data: 0.0003  max mem: 6883
Epoch: [39]  [170/195]  eta: 0:00:20  lr: 0.000154  loss: 4.0583 (4.0908)  time: 0.7976  data: 0.0003  max mem: 6883
Epoch: [39]  [180/195]  eta: 0:00:12  lr: 0.000154  loss: 4.0017 (4.0881)  time: 0.7977  data: 0.0002  max mem: 6883
Epoch: [39]  [190/195]  eta: 0:00:04  lr: 0.000154  loss: 4.0017 (4.0879)  time: 0.7975  data: 0.0002  max mem: 6883
Epoch: [39]  [194/195]  eta: 0:00:00  lr: 0.000154  loss: 3.9837 (4.0824)  time: 0.7969  data: 0.0001  max mem: 6883
Epoch: [39] Total time: 0:02:36 (0.8010 s / it)
Averaged stats: lr: 0.000154  loss: 3.9837 (4.0941)
Test:  [ 0/53]  eta: 0:00:49  loss: 2.8890 (2.8890)  acc1: 35.9375 (35.9375)  acc5: 62.5000 (62.5000)  time: 0.9285  data: 0.6081  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 2.8890 (2.8658)  acc1: 32.2917 (32.5284)  acc5: 62.5000 (63.1629)  time: 0.3791  data: 0.0555  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 2.8628 (2.8474)  acc1: 32.2917 (32.8621)  acc5: 62.5000 (63.7897)  time: 0.3228  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 2.8563 (2.8518)  acc1: 33.3333 (32.7285)  acc5: 62.5000 (63.3569)  time: 0.3228  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 2.8594 (2.8564)  acc1: 31.7708 (32.4695)  acc5: 62.5000 (63.0844)  time: 0.3233  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 2.8488 (2.8607)  acc1: 31.2500 (31.9138)  acc5: 62.5000 (62.7859)  time: 0.3241  data: 0.0001  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 2.8374 (2.8541)  acc1: 31.2500 (31.9400)  acc5: 63.0208 (62.8000)  time: 0.3098  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3309 s / it)
* Acc@1 31.940 Acc@5 62.800 loss 2.854
Accuracy of the network on the 10000 test images: 31.9%
Max accuracy: 32.10%
Epoch: [40]  [  0/195]  eta: 0:04:36  lr: 0.000146  loss: 4.5121 (4.5121)  time: 1.4195  data: 0.6181  max mem: 6883
Epoch: [40]  [ 10/195]  eta: 0:02:39  lr: 0.000146  loss: 4.1294 (4.1585)  time: 0.8606  data: 0.0564  max mem: 6883
Epoch: [40]  [ 20/195]  eta: 0:02:25  lr: 0.000146  loss: 4.1294 (4.1764)  time: 0.8001  data: 0.0002  max mem: 6883
Epoch: [40]  [ 30/195]  eta: 0:02:15  lr: 0.000146  loss: 4.1214 (4.1423)  time: 0.7966  data: 0.0002  max mem: 6883
Epoch: [40]  [ 40/195]  eta: 0:02:06  lr: 0.000146  loss: 4.0406 (4.1046)  time: 0.7967  data: 0.0002  max mem: 6883
Epoch: [40]  [ 50/195]  eta: 0:01:57  lr: 0.000146  loss: 4.1533 (4.1147)  time: 0.7973  data: 0.0003  max mem: 6883
Epoch: [40]  [ 60/195]  eta: 0:01:49  lr: 0.000146  loss: 4.0621 (4.0995)  time: 0.7977  data: 0.0002  max mem: 6883
Epoch: [40]  [ 70/195]  eta: 0:01:40  lr: 0.000146  loss: 4.0547 (4.1064)  time: 0.7982  data: 0.0002  max mem: 6883
Epoch: [40]  [ 80/195]  eta: 0:01:32  lr: 0.000146  loss: 4.1833 (4.1028)  time: 0.7983  data: 0.0003  max mem: 6883
Epoch: [40]  [ 90/195]  eta: 0:01:24  lr: 0.000146  loss: 4.1833 (4.1085)  time: 0.7984  data: 0.0002  max mem: 6883
Epoch: [40]  [100/195]  eta: 0:01:16  lr: 0.000146  loss: 4.1478 (4.1088)  time: 0.7987  data: 0.0003  max mem: 6883
Epoch: [40]  [110/195]  eta: 0:01:08  lr: 0.000146  loss: 4.1478 (4.1155)  time: 0.7973  data: 0.0003  max mem: 6883
Epoch: [40]  [120/195]  eta: 0:01:00  lr: 0.000146  loss: 4.1622 (4.1118)  time: 0.7975  data: 0.0002  max mem: 6883
Epoch: [40]  [130/195]  eta: 0:00:52  lr: 0.000146  loss: 4.0742 (4.1091)  time: 0.7970  data: 0.0002  max mem: 6883
Epoch: [40]  [140/195]  eta: 0:00:44  lr: 0.000146  loss: 4.1228 (4.1108)  time: 0.7971  data: 0.0002  max mem: 6883
Epoch: [40]  [150/195]  eta: 0:00:36  lr: 0.000146  loss: 4.1748 (4.1100)  time: 0.7969  data: 0.0002  max mem: 6883
Epoch: [40]  [160/195]  eta: 0:00:28  lr: 0.000146  loss: 4.1113 (4.1044)  time: 0.7968  data: 0.0003  max mem: 6883
Epoch: [40]  [170/195]  eta: 0:00:20  lr: 0.000146  loss: 4.0681 (4.0997)  time: 0.7976  data: 0.0003  max mem: 6883
Epoch: [40]  [180/195]  eta: 0:00:12  lr: 0.000146  loss: 4.0694 (4.0966)  time: 0.7975  data: 0.0002  max mem: 6883
Epoch: [40]  [190/195]  eta: 0:00:04  lr: 0.000146  loss: 4.0694 (4.0969)  time: 0.7969  data: 0.0002  max mem: 6883
Epoch: [40]  [194/195]  eta: 0:00:00  lr: 0.000146  loss: 4.1109 (4.0984)  time: 0.7975  data: 0.0002  max mem: 6883
Epoch: [40] Total time: 0:02:36 (0.8016 s / it)
Averaged stats: lr: 0.000146  loss: 4.1109 (4.1014)
Test:  [ 0/53]  eta: 0:00:46  loss: 2.8686 (2.8686)  acc1: 36.4583 (36.4583)  acc5: 65.6250 (65.6250)  time: 0.8849  data: 0.5646  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 2.8867 (2.8836)  acc1: 33.8542 (33.0966)  acc5: 63.5417 (63.0682)  time: 0.3761  data: 0.0517  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 2.8526 (2.8580)  acc1: 33.3333 (33.3333)  acc5: 64.0625 (64.3353)  time: 0.3238  data: 0.0004  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 2.8491 (2.8606)  acc1: 33.3333 (33.4677)  acc5: 64.5833 (63.8609)  time: 0.3232  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 2.8613 (2.8635)  acc1: 31.7708 (33.0539)  acc5: 64.0625 (63.9990)  time: 0.3233  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 2.8365 (2.8666)  acc1: 32.2917 (32.8840)  acc5: 64.5833 (63.8480)  time: 0.3236  data: 0.0001  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 2.8269 (2.8593)  acc1: 32.2917 (32.8900)  acc5: 64.5833 (63.8700)  time: 0.3094  data: 0.0002  max mem: 6883
Test: Total time: 0:00:17 (0.3302 s / it)
* Acc@1 32.890 Acc@5 63.870 loss 2.859
Accuracy of the network on the 10000 test images: 32.9%
Max accuracy: 32.89%
Epoch: [41]  [  0/195]  eta: 0:04:16  lr: 0.000138  loss: 4.1636 (4.1636)  time: 1.3138  data: 0.5141  max mem: 6883
Epoch: [41]  [ 10/195]  eta: 0:02:36  lr: 0.000138  loss: 4.1092 (4.0764)  time: 0.8444  data: 0.0469  max mem: 6883
Epoch: [41]  [ 20/195]  eta: 0:02:23  lr: 0.000138  loss: 4.1092 (4.0863)  time: 0.7971  data: 0.0002  max mem: 6883
Epoch: [41]  [ 30/195]  eta: 0:02:14  lr: 0.000138  loss: 4.1101 (4.0582)  time: 0.7976  data: 0.0002  max mem: 6883
Epoch: [41]  [ 40/195]  eta: 0:02:05  lr: 0.000138  loss: 4.1058 (4.0642)  time: 0.7976  data: 0.0002  max mem: 6883
Epoch: [41]  [ 50/195]  eta: 0:01:57  lr: 0.000138  loss: 4.0979 (4.0660)  time: 0.7978  data: 0.0003  max mem: 6883
Epoch: [41]  [ 60/195]  eta: 0:01:48  lr: 0.000138  loss: 3.9741 (4.0486)  time: 0.7981  data: 0.0003  max mem: 6883
Epoch: [41]  [ 70/195]  eta: 0:01:40  lr: 0.000138  loss: 4.0247 (4.0497)  time: 0.7986  data: 0.0003  max mem: 6883
Epoch: [41]  [ 80/195]  eta: 0:01:32  lr: 0.000138  loss: 4.2625 (4.0669)  time: 0.7982  data: 0.0002  max mem: 6883
Epoch: [41]  [ 90/195]  eta: 0:01:24  lr: 0.000138  loss: 4.1890 (4.0687)  time: 0.7975  data: 0.0002  max mem: 6883
Epoch: [41]  [100/195]  eta: 0:01:16  lr: 0.000138  loss: 4.1233 (4.0786)  time: 0.7978  data: 0.0002  max mem: 6883
Epoch: [41]  [110/195]  eta: 0:01:08  lr: 0.000138  loss: 4.1148 (4.0655)  time: 0.7985  data: 0.0002  max mem: 6883
Epoch: [41]  [120/195]  eta: 0:01:00  lr: 0.000138  loss: 3.9165 (4.0618)  time: 0.7992  data: 0.0003  max mem: 6883
Epoch: [41]  [130/195]  eta: 0:00:52  lr: 0.000138  loss: 4.0248 (4.0670)  time: 0.7975  data: 0.0002  max mem: 6883
Epoch: [41]  [140/195]  eta: 0:00:44  lr: 0.000138  loss: 4.0495 (4.0661)  time: 0.7966  data: 0.0002  max mem: 6883
Epoch: [41]  [150/195]  eta: 0:00:36  lr: 0.000138  loss: 4.0940 (4.0738)  time: 0.7970  data: 0.0002  max mem: 6883
Epoch: [41]  [160/195]  eta: 0:00:28  lr: 0.000138  loss: 4.1114 (4.0728)  time: 0.7970  data: 0.0002  max mem: 6883
Epoch: [41]  [170/195]  eta: 0:00:20  lr: 0.000138  loss: 4.0289 (4.0708)  time: 0.7979  data: 0.0002  max mem: 6883
Epoch: [41]  [180/195]  eta: 0:00:12  lr: 0.000138  loss: 4.0552 (4.0706)  time: 0.7981  data: 0.0002  max mem: 6883
Epoch: [41]  [190/195]  eta: 0:00:04  lr: 0.000138  loss: 4.1279 (4.0721)  time: 0.7971  data: 0.0002  max mem: 6883
Epoch: [41]  [194/195]  eta: 0:00:00  lr: 0.000138  loss: 4.1279 (4.0696)  time: 0.7977  data: 0.0002  max mem: 6883
Epoch: [41] Total time: 0:02:36 (0.8010 s / it)
Averaged stats: lr: 0.000138  loss: 4.1279 (4.0706)
Test:  [ 0/53]  eta: 0:00:51  loss: 2.8677 (2.8677)  acc1: 34.3750 (34.3750)  acc5: 64.0625 (64.0625)  time: 0.9685  data: 0.6489  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 2.8677 (2.8726)  acc1: 32.8125 (32.4337)  acc5: 64.0625 (63.0208)  time: 0.3830  data: 0.0592  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 2.8489 (2.8452)  acc1: 32.8125 (32.8869)  acc5: 64.0625 (63.5417)  time: 0.3229  data: 0.0002  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 2.8478 (2.8481)  acc1: 33.8542 (33.0477)  acc5: 61.9792 (63.3401)  time: 0.3228  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 2.8478 (2.8525)  acc1: 33.3333 (32.8760)  acc5: 64.0625 (63.5798)  time: 0.3229  data: 0.0003  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 2.8381 (2.8547)  acc1: 32.8125 (32.7206)  acc5: 64.0625 (63.4804)  time: 0.3233  data: 0.0001  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 2.8289 (2.8479)  acc1: 32.8125 (32.7700)  acc5: 64.0625 (63.4100)  time: 0.3089  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3314 s / it)
* Acc@1 32.770 Acc@5 63.410 loss 2.848
Accuracy of the network on the 10000 test images: 32.8%
Max accuracy: 32.89%
Epoch: [42]  [  0/195]  eta: 0:04:53  lr: 0.000131  loss: 4.1466 (4.1466)  time: 1.5051  data: 0.7051  max mem: 6883
Epoch: [42]  [ 10/195]  eta: 0:02:39  lr: 0.000131  loss: 4.1369 (4.0825)  time: 0.8629  data: 0.0643  max mem: 6883
Epoch: [42]  [ 20/195]  eta: 0:02:25  lr: 0.000131  loss: 4.0685 (4.0865)  time: 0.7973  data: 0.0002  max mem: 6883
Epoch: [42]  [ 30/195]  eta: 0:02:15  lr: 0.000131  loss: 4.0685 (4.0586)  time: 0.7966  data: 0.0002  max mem: 6883
Epoch: [42]  [ 40/195]  eta: 0:02:06  lr: 0.000131  loss: 4.1402 (4.0693)  time: 0.7970  data: 0.0002  max mem: 6883
Epoch: [42]  [ 50/195]  eta: 0:01:57  lr: 0.000131  loss: 4.1392 (4.0612)  time: 0.7982  data: 0.0002  max mem: 6883
Epoch: [42]  [ 60/195]  eta: 0:01:49  lr: 0.000131  loss: 4.0588 (4.0592)  time: 0.7985  data: 0.0002  max mem: 6883
Epoch: [42]  [ 70/195]  eta: 0:01:40  lr: 0.000131  loss: 4.1641 (4.0759)  time: 0.7980  data: 0.0002  max mem: 6883
Epoch: [42]  [ 80/195]  eta: 0:01:32  lr: 0.000131  loss: 4.2261 (4.0940)  time: 0.7984  data: 0.0002  max mem: 6883
Epoch: [42]  [ 90/195]  eta: 0:01:24  lr: 0.000131  loss: 4.1725 (4.0854)  time: 0.7981  data: 0.0002  max mem: 6883
Epoch: [42]  [100/195]  eta: 0:01:16  lr: 0.000131  loss: 4.0734 (4.0907)  time: 0.7977  data: 0.0002  max mem: 6883
Epoch: [42]  [110/195]  eta: 0:01:08  lr: 0.000131  loss: 4.1168 (4.0889)  time: 0.7978  data: 0.0002  max mem: 6883
Epoch: [42]  [120/195]  eta: 0:01:00  lr: 0.000131  loss: 4.0410 (4.0851)  time: 0.7973  data: 0.0002  max mem: 6883
Epoch: [42]  [130/195]  eta: 0:00:52  lr: 0.000131  loss: 4.0574 (4.0884)  time: 0.7971  data: 0.0002  max mem: 6883
Epoch: [42]  [140/195]  eta: 0:00:44  lr: 0.000131  loss: 4.1674 (4.0922)  time: 0.7980  data: 0.0002  max mem: 6883
Epoch: [42]  [150/195]  eta: 0:00:36  lr: 0.000131  loss: 4.1674 (4.0886)  time: 0.7982  data: 0.0002  max mem: 6883
Epoch: [42]  [160/195]  eta: 0:00:28  lr: 0.000131  loss: 4.0217 (4.0837)  time: 0.7993  data: 0.0003  max mem: 6883
Epoch: [42]  [170/195]  eta: 0:00:20  lr: 0.000131  loss: 4.1250 (4.0866)  time: 0.7990  data: 0.0003  max mem: 6883
Epoch: [42]  [180/195]  eta: 0:00:12  lr: 0.000131  loss: 4.1281 (4.0853)  time: 0.7989  data: 0.0002  max mem: 6883
Epoch: [42]  [190/195]  eta: 0:00:04  lr: 0.000131  loss: 4.1178 (4.0891)  time: 0.7983  data: 0.0002  max mem: 6883
Epoch: [42]  [194/195]  eta: 0:00:00  lr: 0.000131  loss: 4.1081 (4.0898)  time: 0.7983  data: 0.0001  max mem: 6883
Epoch: [42] Total time: 0:02:36 (0.8022 s / it)
Averaged stats: lr: 0.000131  loss: 4.1081 (4.0951)
Test:  [ 0/53]  eta: 0:00:56  loss: 2.8739 (2.8739)  acc1: 35.9375 (35.9375)  acc5: 64.5833 (64.5833)  time: 1.0682  data: 0.7495  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 2.8739 (2.8596)  acc1: 32.8125 (33.6648)  acc5: 64.5833 (64.6307)  time: 0.3918  data: 0.0685  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 2.8300 (2.8365)  acc1: 33.8542 (34.4246)  acc5: 64.5833 (64.8314)  time: 0.3229  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 2.8311 (2.8403)  acc1: 34.8958 (34.4422)  acc5: 64.0625 (64.4825)  time: 0.3232  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 2.8501 (2.8441)  acc1: 33.8542 (34.1082)  acc5: 64.5833 (64.3928)  time: 0.3231  data: 0.0003  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 2.8286 (2.8466)  acc1: 33.3333 (33.7725)  acc5: 63.0208 (64.2974)  time: 0.3236  data: 0.0002  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 2.8041 (2.8393)  acc1: 33.8542 (33.8500)  acc5: 64.5833 (64.2900)  time: 0.3092  data: 0.0002  max mem: 6883
Test: Total time: 0:00:17 (0.3335 s / it)
* Acc@1 33.850 Acc@5 64.290 loss 2.839
Accuracy of the network on the 10000 test images: 33.9%
Max accuracy: 33.85%
Epoch: [43]  [  0/195]  eta: 0:04:35  lr: 0.000125  loss: 4.1632 (4.1632)  time: 1.4133  data: 0.6141  max mem: 6883
Epoch: [43]  [ 10/195]  eta: 0:02:38  lr: 0.000125  loss: 4.1446 (4.1342)  time: 0.8554  data: 0.0560  max mem: 6883
Epoch: [43]  [ 20/195]  eta: 0:02:24  lr: 0.000125  loss: 4.1094 (4.0896)  time: 0.7987  data: 0.0003  max mem: 6883
Epoch: [43]  [ 30/195]  eta: 0:02:15  lr: 0.000125  loss: 4.0478 (4.0701)  time: 0.7981  data: 0.0003  max mem: 6883
Epoch: [43]  [ 40/195]  eta: 0:02:06  lr: 0.000125  loss: 4.0692 (4.0742)  time: 0.7971  data: 0.0002  max mem: 6883
Epoch: [43]  [ 50/195]  eta: 0:01:57  lr: 0.000125  loss: 4.1017 (4.0664)  time: 0.7975  data: 0.0003  max mem: 6883
Epoch: [43]  [ 60/195]  eta: 0:01:49  lr: 0.000125  loss: 4.1963 (4.0907)  time: 0.7986  data: 0.0003  max mem: 6883
Epoch: [43]  [ 70/195]  eta: 0:01:40  lr: 0.000125  loss: 4.1963 (4.0945)  time: 0.7995  data: 0.0002  max mem: 6883
Epoch: [43]  [ 80/195]  eta: 0:01:32  lr: 0.000125  loss: 4.1550 (4.0953)  time: 0.7993  data: 0.0002  max mem: 6883
Epoch: [43]  [ 90/195]  eta: 0:01:24  lr: 0.000125  loss: 4.1570 (4.0879)  time: 0.7982  data: 0.0002  max mem: 6883
Epoch: [43]  [100/195]  eta: 0:01:16  lr: 0.000125  loss: 3.9242 (4.0765)  time: 0.7984  data: 0.0003  max mem: 6883
Epoch: [43]  [110/195]  eta: 0:01:08  lr: 0.000125  loss: 4.0301 (4.0877)  time: 0.7991  data: 0.0003  max mem: 6883
Epoch: [43]  [120/195]  eta: 0:01:00  lr: 0.000125  loss: 4.1962 (4.0937)  time: 0.7995  data: 0.0003  max mem: 6883
Epoch: [43]  [130/195]  eta: 0:00:52  lr: 0.000125  loss: 4.0986 (4.0957)  time: 0.7983  data: 0.0002  max mem: 6883
Epoch: [43]  [140/195]  eta: 0:00:44  lr: 0.000125  loss: 4.1519 (4.0975)  time: 0.7973  data: 0.0002  max mem: 6883
Epoch: [43]  [150/195]  eta: 0:00:36  lr: 0.000125  loss: 4.1742 (4.0991)  time: 0.7962  data: 0.0002  max mem: 6883
Epoch: [43]  [160/195]  eta: 0:00:28  lr: 0.000125  loss: 4.1558 (4.0999)  time: 0.7976  data: 0.0002  max mem: 6883
Epoch: [43]  [170/195]  eta: 0:00:20  lr: 0.000125  loss: 4.1547 (4.1020)  time: 0.7994  data: 0.0003  max mem: 6883
Epoch: [43]  [180/195]  eta: 0:00:12  lr: 0.000125  loss: 4.1547 (4.1021)  time: 0.7996  data: 0.0003  max mem: 6883
Epoch: [43]  [190/195]  eta: 0:00:04  lr: 0.000125  loss: 4.1612 (4.1011)  time: 0.7989  data: 0.0002  max mem: 6883
Epoch: [43]  [194/195]  eta: 0:00:00  lr: 0.000125  loss: 4.1929 (4.0998)  time: 0.7991  data: 0.0002  max mem: 6883
Epoch: [43] Total time: 0:02:36 (0.8022 s / it)
Averaged stats: lr: 0.000125  loss: 4.1929 (4.0890)
Test:  [ 0/53]  eta: 0:00:44  loss: 2.8827 (2.8827)  acc1: 33.8542 (33.8542)  acc5: 63.0208 (63.0208)  time: 0.8394  data: 0.5161  max mem: 6883
Test:  [10/53]  eta: 0:00:15  loss: 2.8827 (2.8630)  acc1: 33.3333 (32.7178)  acc5: 63.0208 (63.8258)  time: 0.3720  data: 0.0472  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 2.8493 (2.8365)  acc1: 33.3333 (33.8542)  acc5: 63.5417 (64.4345)  time: 0.3235  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 2.8379 (2.8406)  acc1: 34.3750 (33.7198)  acc5: 63.5417 (64.0961)  time: 0.3232  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 2.8379 (2.8445)  acc1: 32.8125 (33.3460)  acc5: 64.5833 (64.4055)  time: 0.3235  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 2.8293 (2.8461)  acc1: 32.8125 (33.2618)  acc5: 65.1042 (64.2157)  time: 0.3238  data: 0.0002  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 2.8158 (2.8382)  acc1: 32.8125 (33.3200)  acc5: 65.6250 (64.2000)  time: 0.3094  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3294 s / it)
* Acc@1 33.320 Acc@5 64.200 loss 2.838
Accuracy of the network on the 10000 test images: 33.3%
Max accuracy: 33.85%
Epoch: [44]  [  0/195]  eta: 0:04:21  lr: 0.000119  loss: 3.9742 (3.9742)  time: 1.3404  data: 0.5417  max mem: 6883
Epoch: [44]  [ 10/195]  eta: 0:02:37  lr: 0.000119  loss: 4.1075 (4.0480)  time: 0.8493  data: 0.0494  max mem: 6883
Epoch: [44]  [ 20/195]  eta: 0:02:24  lr: 0.000119  loss: 4.1642 (4.0957)  time: 0.7987  data: 0.0002  max mem: 6883
Epoch: [44]  [ 30/195]  eta: 0:02:14  lr: 0.000119  loss: 4.1642 (4.1061)  time: 0.7977  data: 0.0002  max mem: 6883
Epoch: [44]  [ 40/195]  eta: 0:02:05  lr: 0.000119  loss: 4.1722 (4.1245)  time: 0.7971  data: 0.0002  max mem: 6883
Epoch: [44]  [ 50/195]  eta: 0:01:57  lr: 0.000119  loss: 4.1970 (4.1141)  time: 0.7972  data: 0.0002  max mem: 6883
Epoch: [44]  [ 60/195]  eta: 0:01:48  lr: 0.000119  loss: 4.1386 (4.1033)  time: 0.7972  data: 0.0003  max mem: 6883
Epoch: [44]  [ 70/195]  eta: 0:01:40  lr: 0.000119  loss: 4.0643 (4.0914)  time: 0.7983  data: 0.0002  max mem: 6883
Epoch: [44]  [ 80/195]  eta: 0:01:32  lr: 0.000119  loss: 4.0054 (4.0773)  time: 0.7990  data: 0.0002  max mem: 6883
Epoch: [44]  [ 90/195]  eta: 0:01:24  lr: 0.000119  loss: 4.0054 (4.0792)  time: 0.7981  data: 0.0003  max mem: 6883
Epoch: [44]  [100/195]  eta: 0:01:16  lr: 0.000119  loss: 4.2104 (4.0819)  time: 0.7979  data: 0.0002  max mem: 6883
Epoch: [44]  [110/195]  eta: 0:01:08  lr: 0.000119  loss: 4.1026 (4.0759)  time: 0.7975  data: 0.0002  max mem: 6883
Epoch: [44]  [120/195]  eta: 0:01:00  lr: 0.000119  loss: 4.0995 (4.0756)  time: 0.7984  data: 0.0002  max mem: 6883
Epoch: [44]  [130/195]  eta: 0:00:52  lr: 0.000119  loss: 4.1835 (4.0792)  time: 0.7982  data: 0.0002  max mem: 6883
Epoch: [44]  [140/195]  eta: 0:00:44  lr: 0.000119  loss: 4.0435 (4.0776)  time: 0.7978  data: 0.0002  max mem: 6883
Epoch: [44]  [150/195]  eta: 0:00:36  lr: 0.000119  loss: 4.0435 (4.0776)  time: 0.7973  data: 0.0002  max mem: 6883
Epoch: [44]  [160/195]  eta: 0:00:28  lr: 0.000119  loss: 4.1453 (4.0844)  time: 0.8010  data: 0.0002  max mem: 6883
Epoch: [44]  [170/195]  eta: 0:00:20  lr: 0.000119  loss: 4.1512 (4.0875)  time: 0.8014  data: 0.0002  max mem: 6883
Epoch: [44]  [180/195]  eta: 0:00:12  lr: 0.000119  loss: 4.1493 (4.0911)  time: 0.7992  data: 0.0002  max mem: 6883
Epoch: [44]  [190/195]  eta: 0:00:04  lr: 0.000119  loss: 4.1416 (4.0908)  time: 0.7982  data: 0.0002  max mem: 6883
Epoch: [44]  [194/195]  eta: 0:00:00  lr: 0.000119  loss: 4.1529 (4.0939)  time: 0.7980  data: 0.0001  max mem: 6883
Epoch: [44] Total time: 0:02:36 (0.8017 s / it)
Averaged stats: lr: 0.000119  loss: 4.1529 (4.0860)
Test:  [ 0/53]  eta: 0:00:50  loss: 2.8480 (2.8480)  acc1: 36.9792 (36.9792)  acc5: 65.6250 (65.6250)  time: 0.9461  data: 0.6260  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 2.8501 (2.8396)  acc1: 34.8958 (33.5701)  acc5: 65.6250 (64.0152)  time: 0.3810  data: 0.0572  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 2.8198 (2.8151)  acc1: 34.8958 (34.2758)  acc5: 63.5417 (64.4593)  time: 0.3232  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 2.8198 (2.8171)  acc1: 34.3750 (34.4254)  acc5: 63.5417 (64.3145)  time: 0.3231  data: 0.0004  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 2.8249 (2.8234)  acc1: 33.3333 (33.7907)  acc5: 63.5417 (64.3674)  time: 0.3233  data: 0.0003  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 2.8189 (2.8245)  acc1: 32.2917 (33.7112)  acc5: 64.5833 (64.2974)  time: 0.3237  data: 0.0002  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 2.7983 (2.8169)  acc1: 33.3333 (33.7600)  acc5: 65.6250 (64.3000)  time: 0.3089  data: 0.0002  max mem: 6883
Test: Total time: 0:00:17 (0.3314 s / it)
* Acc@1 33.760 Acc@5 64.300 loss 2.817
Accuracy of the network on the 10000 test images: 33.8%
Max accuracy: 33.85%
Epoch: [45]  [  0/195]  eta: 0:04:37  lr: 0.000114  loss: 3.9436 (3.9436)  time: 1.4217  data: 0.6244  max mem: 6883
Epoch: [45]  [ 10/195]  eta: 0:02:38  lr: 0.000114  loss: 4.0850 (4.0747)  time: 0.8546  data: 0.0569  max mem: 6883
Epoch: [45]  [ 20/195]  eta: 0:02:24  lr: 0.000114  loss: 4.1893 (4.1242)  time: 0.7966  data: 0.0002  max mem: 6883
Epoch: [45]  [ 30/195]  eta: 0:02:14  lr: 0.000114  loss: 4.0909 (4.0715)  time: 0.7971  data: 0.0002  max mem: 6883
Epoch: [45]  [ 40/195]  eta: 0:02:05  lr: 0.000114  loss: 4.0018 (4.0699)  time: 0.7974  data: 0.0003  max mem: 6883
Epoch: [45]  [ 50/195]  eta: 0:01:57  lr: 0.000114  loss: 4.1113 (4.0819)  time: 0.7970  data: 0.0003  max mem: 6883
Epoch: [45]  [ 60/195]  eta: 0:01:48  lr: 0.000114  loss: 4.1257 (4.0826)  time: 0.7974  data: 0.0003  max mem: 6883
Epoch: [45]  [ 70/195]  eta: 0:01:40  lr: 0.000114  loss: 4.1378 (4.0933)  time: 0.7978  data: 0.0003  max mem: 6883
Epoch: [45]  [ 80/195]  eta: 0:01:32  lr: 0.000114  loss: 4.1116 (4.0845)  time: 0.7972  data: 0.0002  max mem: 6883
Epoch: [45]  [ 90/195]  eta: 0:01:24  lr: 0.000114  loss: 4.0706 (4.0813)  time: 0.7972  data: 0.0002  max mem: 6883
Epoch: [45]  [100/195]  eta: 0:01:16  lr: 0.000114  loss: 4.1482 (4.0808)  time: 0.7975  data: 0.0002  max mem: 6883
Epoch: [45]  [110/195]  eta: 0:01:08  lr: 0.000114  loss: 4.1925 (4.0820)  time: 0.7977  data: 0.0003  max mem: 6883
Epoch: [45]  [120/195]  eta: 0:01:00  lr: 0.000114  loss: 4.0577 (4.0771)  time: 0.7984  data: 0.0003  max mem: 6883
Epoch: [45]  [130/195]  eta: 0:00:52  lr: 0.000114  loss: 4.0212 (4.0742)  time: 0.7966  data: 0.0002  max mem: 6883
Epoch: [45]  [140/195]  eta: 0:00:44  lr: 0.000114  loss: 4.1379 (4.0790)  time: 0.7971  data: 0.0003  max mem: 6883
Epoch: [45]  [150/195]  eta: 0:00:36  lr: 0.000114  loss: 4.1786 (4.0818)  time: 0.7978  data: 0.0002  max mem: 6883
Epoch: [45]  [160/195]  eta: 0:00:28  lr: 0.000114  loss: 4.1870 (4.0884)  time: 0.7970  data: 0.0002  max mem: 6883
Epoch: [45]  [170/195]  eta: 0:00:20  lr: 0.000114  loss: 4.1208 (4.0881)  time: 0.7973  data: 0.0003  max mem: 6883
Epoch: [45]  [180/195]  eta: 0:00:12  lr: 0.000114  loss: 4.0527 (4.0814)  time: 0.7980  data: 0.0003  max mem: 6883
Epoch: [45]  [190/195]  eta: 0:00:04  lr: 0.000114  loss: 4.1319 (4.0892)  time: 0.7970  data: 0.0002  max mem: 6883
Epoch: [45]  [194/195]  eta: 0:00:00  lr: 0.000114  loss: 4.1809 (4.0916)  time: 0.7970  data: 0.0001  max mem: 6883
Epoch: [45] Total time: 0:02:36 (0.8012 s / it)
Averaged stats: lr: 0.000114  loss: 4.1809 (4.0841)
Test:  [ 0/53]  eta: 0:00:45  loss: 2.8148 (2.8148)  acc1: 34.8958 (34.8958)  acc5: 65.1042 (65.1042)  time: 0.8541  data: 0.5346  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 2.8596 (2.8382)  acc1: 32.8125 (33.6174)  acc5: 66.1458 (64.6780)  time: 0.3727  data: 0.0489  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 2.8078 (2.8094)  acc1: 32.8125 (34.3502)  acc5: 65.6250 (65.3274)  time: 0.3232  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 2.8078 (2.8113)  acc1: 34.3750 (34.1062)  acc5: 64.5833 (65.1546)  time: 0.3234  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 2.8264 (2.8136)  acc1: 32.2917 (33.7271)  acc5: 64.5833 (65.3074)  time: 0.3236  data: 0.0003  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 2.8025 (2.8169)  acc1: 32.2917 (33.4661)  acc5: 65.1042 (65.0633)  time: 0.3237  data: 0.0002  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 2.7819 (2.8095)  acc1: 32.8125 (33.5500)  acc5: 65.1042 (65.0800)  time: 0.3094  data: 0.0002  max mem: 6883
Test: Total time: 0:00:17 (0.3299 s / it)
* Acc@1 33.550 Acc@5 65.080 loss 2.810
Accuracy of the network on the 10000 test images: 33.6%
Max accuracy: 33.85%
Epoch: [46]  [  0/195]  eta: 0:04:31  lr: 0.000110  loss: 4.3560 (4.3560)  time: 1.3942  data: 0.5692  max mem: 6883
Epoch: [46]  [ 10/195]  eta: 0:02:37  lr: 0.000110  loss: 4.0267 (4.0151)  time: 0.8529  data: 0.0519  max mem: 6883
Epoch: [46]  [ 20/195]  eta: 0:02:24  lr: 0.000110  loss: 4.1328 (4.1127)  time: 0.7966  data: 0.0002  max mem: 6883
Epoch: [46]  [ 30/195]  eta: 0:02:14  lr: 0.000110  loss: 4.0806 (4.0431)  time: 0.7969  data: 0.0003  max mem: 6883
Epoch: [46]  [ 40/195]  eta: 0:02:05  lr: 0.000110  loss: 3.9193 (4.0353)  time: 0.7974  data: 0.0003  max mem: 6883
Epoch: [46]  [ 50/195]  eta: 0:01:57  lr: 0.000110  loss: 4.0201 (4.0533)  time: 0.7966  data: 0.0002  max mem: 6883
Epoch: [46]  [ 60/195]  eta: 0:01:48  lr: 0.000110  loss: 4.1284 (4.0616)  time: 0.7977  data: 0.0002  max mem: 6883
Epoch: [46]  [ 70/195]  eta: 0:01:40  lr: 0.000110  loss: 4.1127 (4.0651)  time: 0.7982  data: 0.0003  max mem: 6883
Epoch: [46]  [ 80/195]  eta: 0:01:32  lr: 0.000110  loss: 3.9830 (4.0468)  time: 0.7977  data: 0.0002  max mem: 6883
Epoch: [46]  [ 90/195]  eta: 0:01:24  lr: 0.000110  loss: 4.0295 (4.0523)  time: 0.7979  data: 0.0002  max mem: 6883
Epoch: [46]  [100/195]  eta: 0:01:16  lr: 0.000110  loss: 4.1447 (4.0508)  time: 0.7981  data: 0.0002  max mem: 6883
Epoch: [46]  [110/195]  eta: 0:01:08  lr: 0.000110  loss: 4.1447 (4.0572)  time: 0.8011  data: 0.0003  max mem: 6883
Epoch: [46]  [120/195]  eta: 0:01:00  lr: 0.000110  loss: 4.1383 (4.0618)  time: 0.8010  data: 0.0002  max mem: 6883
Epoch: [46]  [130/195]  eta: 0:00:52  lr: 0.000110  loss: 4.1175 (4.0586)  time: 0.7964  data: 0.0002  max mem: 6883
Epoch: [46]  [140/195]  eta: 0:00:44  lr: 0.000110  loss: 4.1175 (4.0694)  time: 0.7962  data: 0.0002  max mem: 6883
Epoch: [46]  [150/195]  eta: 0:00:36  lr: 0.000110  loss: 4.1611 (4.0628)  time: 0.7966  data: 0.0003  max mem: 6883
Epoch: [46]  [160/195]  eta: 0:00:28  lr: 0.000110  loss: 4.0414 (4.0643)  time: 0.7968  data: 0.0002  max mem: 6883
Epoch: [46]  [170/195]  eta: 0:00:20  lr: 0.000110  loss: 4.0414 (4.0636)  time: 0.7966  data: 0.0002  max mem: 6883
Epoch: [46]  [180/195]  eta: 0:00:12  lr: 0.000110  loss: 4.0038 (4.0645)  time: 0.7975  data: 0.0002  max mem: 6883
Epoch: [46]  [190/195]  eta: 0:00:04  lr: 0.000110  loss: 4.0127 (4.0635)  time: 0.7975  data: 0.0002  max mem: 6883
Epoch: [46]  [194/195]  eta: 0:00:00  lr: 0.000110  loss: 4.0855 (4.0665)  time: 0.7969  data: 0.0001  max mem: 6883
Epoch: [46] Total time: 0:02:36 (0.8012 s / it)
Averaged stats: lr: 0.000110  loss: 4.0855 (4.0716)
Test:  [ 0/53]  eta: 0:00:55  loss: 2.8185 (2.8185)  acc1: 36.4583 (36.4583)  acc5: 66.1458 (66.1458)  time: 1.0528  data: 0.7292  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 2.8254 (2.8249)  acc1: 32.8125 (33.6174)  acc5: 66.1458 (65.3409)  time: 0.3899  data: 0.0666  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 2.7995 (2.8008)  acc1: 33.8542 (34.0774)  acc5: 65.6250 (65.6002)  time: 0.3226  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 2.7995 (2.8051)  acc1: 33.8542 (33.9382)  acc5: 64.0625 (64.9530)  time: 0.3230  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 2.8196 (2.8095)  acc1: 32.2917 (33.6001)  acc5: 63.5417 (64.9009)  time: 0.3232  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 2.8175 (2.8116)  acc1: 32.2917 (33.3231)  acc5: 64.5833 (64.8999)  time: 0.3234  data: 0.0001  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 2.7853 (2.8036)  acc1: 32.8125 (33.4200)  acc5: 65.1042 (64.8500)  time: 0.3089  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3330 s / it)
* Acc@1 33.420 Acc@5 64.850 loss 2.804
Accuracy of the network on the 10000 test images: 33.4%
Max accuracy: 33.85%
Epoch: [47]  [  0/195]  eta: 0:04:46  lr: 0.000106  loss: 4.3569 (4.3569)  time: 1.4667  data: 0.5759  max mem: 6883
Epoch: [47]  [ 10/195]  eta: 0:02:38  lr: 0.000106  loss: 4.0669 (4.0051)  time: 0.8583  data: 0.0526  max mem: 6883
Epoch: [47]  [ 20/195]  eta: 0:02:25  lr: 0.000106  loss: 4.0669 (4.0218)  time: 0.7974  data: 0.0003  max mem: 6883
Epoch: [47]  [ 30/195]  eta: 0:02:15  lr: 0.000106  loss: 4.1058 (4.0392)  time: 0.7982  data: 0.0003  max mem: 6883
Epoch: [47]  [ 40/195]  eta: 0:02:06  lr: 0.000106  loss: 4.1988 (4.0539)  time: 0.7979  data: 0.0003  max mem: 6883
Epoch: [47]  [ 50/195]  eta: 0:01:57  lr: 0.000106  loss: 4.2321 (4.0687)  time: 0.7974  data: 0.0003  max mem: 6883
Epoch: [47]  [ 60/195]  eta: 0:01:49  lr: 0.000106  loss: 4.2109 (4.0786)  time: 0.7972  data: 0.0003  max mem: 6883
Epoch: [47]  [ 70/195]  eta: 0:01:40  lr: 0.000106  loss: 4.0975 (4.0666)  time: 0.7976  data: 0.0003  max mem: 6883
Epoch: [47]  [ 80/195]  eta: 0:01:32  lr: 0.000106  loss: 4.0824 (4.0720)  time: 0.7970  data: 0.0003  max mem: 6883
Epoch: [47]  [ 90/195]  eta: 0:01:24  lr: 0.000106  loss: 4.1519 (4.0723)  time: 0.7977  data: 0.0003  max mem: 6883
Epoch: [47]  [100/195]  eta: 0:01:16  lr: 0.000106  loss: 4.1264 (4.0733)  time: 0.7995  data: 0.0003  max mem: 6883
Epoch: [47]  [110/195]  eta: 0:01:08  lr: 0.000106  loss: 4.1264 (4.0764)  time: 0.7990  data: 0.0002  max mem: 6883
Epoch: [47]  [120/195]  eta: 0:01:00  lr: 0.000106  loss: 4.1255 (4.0764)  time: 0.7977  data: 0.0002  max mem: 6883
Epoch: [47]  [130/195]  eta: 0:00:52  lr: 0.000106  loss: 4.0758 (4.0727)  time: 0.7966  data: 0.0002  max mem: 6883
Epoch: [47]  [140/195]  eta: 0:00:44  lr: 0.000106  loss: 4.0880 (4.0720)  time: 0.7967  data: 0.0002  max mem: 6883
Epoch: [47]  [150/195]  eta: 0:00:36  lr: 0.000106  loss: 4.0880 (4.0691)  time: 0.7974  data: 0.0002  max mem: 6883
Epoch: [47]  [160/195]  eta: 0:00:28  lr: 0.000106  loss: 4.1350 (4.0697)  time: 0.7970  data: 0.0002  max mem: 6883
Epoch: [47]  [170/195]  eta: 0:00:20  lr: 0.000106  loss: 4.0803 (4.0673)  time: 0.7970  data: 0.0002  max mem: 6883
Epoch: [47]  [180/195]  eta: 0:00:12  lr: 0.000106  loss: 3.9873 (4.0652)  time: 0.7974  data: 0.0002  max mem: 6883
Epoch: [47]  [190/195]  eta: 0:00:04  lr: 0.000106  loss: 4.0559 (4.0667)  time: 0.7964  data: 0.0002  max mem: 6883
Epoch: [47]  [194/195]  eta: 0:00:00  lr: 0.000106  loss: 4.0559 (4.0684)  time: 0.7970  data: 0.0001  max mem: 6883
Epoch: [47] Total time: 0:02:36 (0.8015 s / it)
Averaged stats: lr: 0.000106  loss: 4.0559 (4.0718)
Test:  [ 0/53]  eta: 0:00:47  loss: 2.8695 (2.8695)  acc1: 36.9792 (36.9792)  acc5: 65.6250 (65.6250)  time: 0.8892  data: 0.5692  max mem: 6883
Test:  [10/53]  eta: 0:00:16  loss: 2.8547 (2.8316)  acc1: 33.8542 (34.4697)  acc5: 65.6250 (64.5833)  time: 0.3758  data: 0.0520  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 2.8222 (2.8091)  acc1: 34.3750 (34.8214)  acc5: 65.6250 (65.5506)  time: 0.3232  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 2.8131 (2.8112)  acc1: 34.3750 (34.7782)  acc5: 65.6250 (65.4066)  time: 0.3227  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 2.8131 (2.8132)  acc1: 33.3333 (34.4639)  acc5: 64.5833 (65.4599)  time: 0.3230  data: 0.0003  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 2.7977 (2.8157)  acc1: 33.8542 (34.3137)  acc5: 63.5417 (65.2471)  time: 0.3239  data: 0.0002  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 2.7727 (2.8094)  acc1: 34.3750 (34.4100)  acc5: 66.1458 (65.2700)  time: 0.3094  data: 0.0002  max mem: 6883
Test: Total time: 0:00:17 (0.3302 s / it)
* Acc@1 34.410 Acc@5 65.270 loss 2.809
Accuracy of the network on the 10000 test images: 34.4%
Max accuracy: 34.41%
Epoch: [48]  [  0/195]  eta: 0:04:33  lr: 0.000104  loss: 4.3109 (4.3109)  time: 1.4021  data: 0.6003  max mem: 6883
Epoch: [48]  [ 10/195]  eta: 0:02:37  lr: 0.000104  loss: 4.1839 (4.1622)  time: 0.8535  data: 0.0548  max mem: 6883
Epoch: [48]  [ 20/195]  eta: 0:02:24  lr: 0.000104  loss: 4.1597 (4.1283)  time: 0.7970  data: 0.0002  max mem: 6883
Epoch: [48]  [ 30/195]  eta: 0:02:14  lr: 0.000104  loss: 4.0902 (4.1090)  time: 0.7967  data: 0.0002  max mem: 6883
Epoch: [48]  [ 40/195]  eta: 0:02:05  lr: 0.000104  loss: 4.1526 (4.1029)  time: 0.7970  data: 0.0002  max mem: 6883
Epoch: [48]  [ 50/195]  eta: 0:01:57  lr: 0.000104  loss: 4.2011 (4.1182)  time: 0.7976  data: 0.0002  max mem: 6883
Epoch: [48]  [ 60/195]  eta: 0:01:48  lr: 0.000104  loss: 4.1859 (4.1169)  time: 0.7973  data: 0.0002  max mem: 6883
Epoch: [48]  [ 70/195]  eta: 0:01:40  lr: 0.000104  loss: 4.1397 (4.1165)  time: 0.7972  data: 0.0002  max mem: 6883
Epoch: [48]  [ 80/195]  eta: 0:01:32  lr: 0.000104  loss: 4.1406 (4.1147)  time: 0.7970  data: 0.0002  max mem: 6883
Epoch: [48]  [ 90/195]  eta: 0:01:24  lr: 0.000104  loss: 4.1586 (4.1142)  time: 0.7961  data: 0.0002  max mem: 6883
Epoch: [48]  [100/195]  eta: 0:01:16  lr: 0.000104  loss: 4.0138 (4.1005)  time: 0.7973  data: 0.0003  max mem: 6883
Epoch: [48]  [110/195]  eta: 0:01:08  lr: 0.000104  loss: 4.0671 (4.1093)  time: 0.7982  data: 0.0003  max mem: 6883
Epoch: [48]  [120/195]  eta: 0:01:00  lr: 0.000104  loss: 4.2040 (4.1104)  time: 0.7975  data: 0.0003  max mem: 6883
Epoch: [48]  [130/195]  eta: 0:00:52  lr: 0.000104  loss: 4.1415 (4.1123)  time: 0.7971  data: 0.0003  max mem: 6883
Epoch: [48]  [140/195]  eta: 0:00:44  lr: 0.000104  loss: 4.1238 (4.1096)  time: 0.7982  data: 0.0003  max mem: 6883
Epoch: [48]  [150/195]  eta: 0:00:36  lr: 0.000104  loss: 3.9208 (4.0963)  time: 0.7985  data: 0.0003  max mem: 6883
Epoch: [48]  [160/195]  eta: 0:00:28  lr: 0.000104  loss: 3.9155 (4.0907)  time: 0.7989  data: 0.0002  max mem: 6883
Epoch: [48]  [170/195]  eta: 0:00:20  lr: 0.000104  loss: 4.1289 (4.0944)  time: 0.7983  data: 0.0002  max mem: 6883
Epoch: [48]  [180/195]  eta: 0:00:12  lr: 0.000104  loss: 4.1403 (4.0969)  time: 0.7986  data: 0.0002  max mem: 6883
Epoch: [48]  [190/195]  eta: 0:00:04  lr: 0.000104  loss: 4.0748 (4.0933)  time: 0.7986  data: 0.0002  max mem: 6883
Epoch: [48]  [194/195]  eta: 0:00:00  lr: 0.000104  loss: 4.0748 (4.0952)  time: 0.7980  data: 0.0001  max mem: 6883
Epoch: [48] Total time: 0:02:36 (0.8014 s / it)
Averaged stats: lr: 0.000104  loss: 4.0748 (4.0753)
Test:  [ 0/53]  eta: 0:00:44  loss: 2.8292 (2.8292)  acc1: 38.0208 (38.0208)  acc5: 67.1875 (67.1875)  time: 0.8372  data: 0.5144  max mem: 6883
Test:  [10/53]  eta: 0:00:15  loss: 2.8292 (2.8187)  acc1: 33.8542 (33.0966)  acc5: 65.1042 (64.6307)  time: 0.3718  data: 0.0471  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 2.7969 (2.7927)  acc1: 34.3750 (34.1022)  acc5: 65.1042 (65.3274)  time: 0.3237  data: 0.0004  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 2.7969 (2.7950)  acc1: 35.4167 (34.3918)  acc5: 63.5417 (64.8858)  time: 0.3231  data: 0.0004  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 2.8136 (2.7983)  acc1: 34.3750 (34.2607)  acc5: 63.5417 (64.7866)  time: 0.3232  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:01  loss: 2.7693 (2.7992)  acc1: 34.3750 (34.3239)  acc5: 63.5417 (64.6650)  time: 0.3237  data: 0.0001  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 2.7618 (2.7918)  acc1: 34.3750 (34.3900)  acc5: 64.0625 (64.7000)  time: 0.3091  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3294 s / it)
* Acc@1 34.390 Acc@5 64.700 loss 2.792
Accuracy of the network on the 10000 test images: 34.4%
Max accuracy: 34.41%
Epoch: [49]  [  0/195]  eta: 0:04:37  lr: 0.000102  loss: 4.2534 (4.2534)  time: 1.4216  data: 0.5043  max mem: 6883
Epoch: [49]  [ 10/195]  eta: 0:02:38  lr: 0.000102  loss: 4.0949 (4.0692)  time: 0.8571  data: 0.0461  max mem: 6883
Epoch: [49]  [ 20/195]  eta: 0:02:24  lr: 0.000102  loss: 4.0949 (4.0627)  time: 0.7985  data: 0.0003  max mem: 6883
Epoch: [49]  [ 30/195]  eta: 0:02:15  lr: 0.000102  loss: 4.1119 (4.0514)  time: 0.7977  data: 0.0003  max mem: 6883
Epoch: [49]  [ 40/195]  eta: 0:02:06  lr: 0.000102  loss: 4.0177 (4.0411)  time: 0.7984  data: 0.0003  max mem: 6883
Epoch: [49]  [ 50/195]  eta: 0:01:57  lr: 0.000102  loss: 3.9851 (4.0463)  time: 0.7988  data: 0.0003  max mem: 6883
Epoch: [49]  [ 60/195]  eta: 0:01:49  lr: 0.000102  loss: 4.0122 (4.0433)  time: 0.7981  data: 0.0002  max mem: 6883
Epoch: [49]  [ 70/195]  eta: 0:01:40  lr: 0.000102  loss: 4.1122 (4.0520)  time: 0.7973  data: 0.0002  max mem: 6883
Epoch: [49]  [ 80/195]  eta: 0:01:32  lr: 0.000102  loss: 4.1476 (4.0550)  time: 0.7973  data: 0.0003  max mem: 6883
Epoch: [49]  [ 90/195]  eta: 0:01:24  lr: 0.000102  loss: 3.9915 (4.0418)  time: 0.7976  data: 0.0003  max mem: 6883
Epoch: [49]  [100/195]  eta: 0:01:16  lr: 0.000102  loss: 4.1135 (4.0469)  time: 0.7979  data: 0.0002  max mem: 6883
Epoch: [49]  [110/195]  eta: 0:01:08  lr: 0.000102  loss: 4.1383 (4.0516)  time: 0.7969  data: 0.0002  max mem: 6883
Epoch: [49]  [120/195]  eta: 0:01:00  lr: 0.000102  loss: 4.2091 (4.0677)  time: 0.7997  data: 0.0003  max mem: 6883
Epoch: [49]  [130/195]  eta: 0:00:52  lr: 0.000102  loss: 4.2219 (4.0753)  time: 0.7994  data: 0.0003  max mem: 6883
Epoch: [49]  [140/195]  eta: 0:00:44  lr: 0.000102  loss: 4.2102 (4.0786)  time: 0.7973  data: 0.0003  max mem: 6883
Epoch: [49]  [150/195]  eta: 0:00:36  lr: 0.000102  loss: 4.1728 (4.0821)  time: 0.7978  data: 0.0003  max mem: 6883
Epoch: [49]  [160/195]  eta: 0:00:28  lr: 0.000102  loss: 4.1732 (4.0872)  time: 0.7972  data: 0.0003  max mem: 6883
Epoch: [49]  [170/195]  eta: 0:00:20  lr: 0.000102  loss: 4.1144 (4.0827)  time: 0.7976  data: 0.0003  max mem: 6883
Epoch: [49]  [180/195]  eta: 0:00:12  lr: 0.000102  loss: 4.0906 (4.0816)  time: 0.7982  data: 0.0002  max mem: 6883
Epoch: [49]  [190/195]  eta: 0:00:04  lr: 0.000102  loss: 4.0906 (4.0833)  time: 0.7974  data: 0.0002  max mem: 6883
Epoch: [49]  [194/195]  eta: 0:00:00  lr: 0.000102  loss: 4.1866 (4.0850)  time: 0.7976  data: 0.0001  max mem: 6883
Epoch: [49] Total time: 0:02:36 (0.8018 s / it)
Averaged stats: lr: 0.000102  loss: 4.1866 (4.0787)
Test:  [ 0/53]  eta: 0:00:42  loss: 2.8303 (2.8303)  acc1: 36.4583 (36.4583)  acc5: 66.6667 (66.6667)  time: 0.7984  data: 0.4768  max mem: 6883
Test:  [10/53]  eta: 0:00:15  loss: 2.8315 (2.8195)  acc1: 33.3333 (33.8068)  acc5: 65.1042 (64.3466)  time: 0.3665  data: 0.0437  max mem: 6883
Test:  [20/53]  eta: 0:00:11  loss: 2.8114 (2.7933)  acc1: 35.4167 (34.6478)  acc5: 64.0625 (64.7817)  time: 0.3225  data: 0.0003  max mem: 6883
Test:  [30/53]  eta: 0:00:07  loss: 2.8073 (2.7963)  acc1: 35.4167 (34.5262)  acc5: 64.0625 (64.4321)  time: 0.3225  data: 0.0003  max mem: 6883
Test:  [40/53]  eta: 0:00:04  loss: 2.7970 (2.7987)  acc1: 33.3333 (34.1209)  acc5: 63.5417 (64.4055)  time: 0.3228  data: 0.0002  max mem: 6883
Test:  [50/53]  eta: 0:00:00  loss: 2.7828 (2.8010)  acc1: 33.3333 (33.9052)  acc5: 62.5000 (64.1851)  time: 0.3235  data: 0.0001  max mem: 6883
Test:  [52/53]  eta: 0:00:00  loss: 2.7742 (2.7932)  acc1: 33.8542 (33.9400)  acc5: 65.6250 (64.2300)  time: 0.3088  data: 0.0001  max mem: 6883
Test: Total time: 0:00:17 (0.3294 s / it)
* Acc@1 33.940 Acc@5 64.230 loss 2.793
Accuracy of the network on the 10000 test images: 33.9%
Max accuracy: 34.41%
Training time 2:25:00
Total execution time: 8723 seconds
