Setting up data loaders...
Using 8 workers for data loading
Train dataset size: 1000
Setting up Improved NPT model with variance-aware attention regularization and entropy maximization...
  Momentum beta: 0.9
  Adaptation alpha: 0.1
  Min weight factor: 0.1
  Max weight factor: 3.0
  Warmup steps: 50
  Lambda var: 0.1
  Lambda entropy: 0.05
  Epsilon: 1e-08
Loading CLIP (backbone: ViT-B/16)
Building NPT Custom CLIP with variance-aware attention regularization and entropy maximization
Initializing a generic context
Initial context: "X X X X X X X X X X X X X X X X"
Number of context words (tokens): 16
Initializing nuisance context vector
Turning off gradients in both the image and the text encoder
Improved NPT Model with variance-aware attention regularization and entropy maximization setup completed
Starting Improved NPT training with variance-aware attention regularization and entropy maximization...
