Enhanced Uniform MiniZinc Dataset Build Summary
==================================================

Configuration:
  Minimum instances per model: 3
  Maximum instances per model: unlimited
  Include self-contained models: False
  Skip deduplication: False
  Naming strategy: model_name

Processing Results:
  Total models found: 622
  Total instances found: 18488
  Models with any instances: 356
  Models with >=3 instances: 349
  Self-contained models: 7
  Non-self-contained models: 342
  Pre-deduplication dataset size: 342
  Duplicate groups found: 83
  Duplicate models removed: 115
  Final clean dataset size: 227
  Processing time: 1.4 seconds

Processing Pipeline:
  1. Discovered all models and instances across the repository
  2. Filtered problems with >= 3 instances
  3. Excluded self-contained models
  4. Created initial dataset structure
  5. Analyzed for duplicate models using SHA256 hashes
  6. Removed duplicates to create clean final dataset
  7. Generated comprehensive statistics and documentation

Quality Assurance:
  - Content-based deduplication using SHA256 hashes
  - Proper MiniZinc parameter detection (colon syntax)
  - Configurable filtering pipeline
  - Comprehensive error handling and logging
  - Reproducible build process with saved configuration

Dataset Structure:
  - Each subdirectory contains exactly one model and its instances
  - Subdirectories named using 'model_name' strategy
  - All models require external data files (unless configured otherwise)
  - All problems have >= 3 instances
  - No duplicate models (verified by content hash)
