Caltech-UCSD Birds-200-2011: Fine-Grained Bird Species Classification (200 classes)

Problem statement
Participants must build a model to classify bird photographs into one of 200 species. This is a challenging fine-grained visual recognition problem where classes are visually similar. In addition to RGB images, binary segmentation masks isolating the bird are provided for both train and test to encourage effective preprocessing and feature engineering.

Files you will use
- train.csv: Training metadata with two columns: id, label. Each row corresponds to one training example. id matches an image file in train_images and its corresponding mask in train_segmentations.
- test.csv: Test metadata with a single column: id. Each id matches an image file in test_images and its corresponding mask in test_segmentations.
- classes.csv: List of class labels with a single column: label. These are the only valid labels for the competition.
- train_images/: Color JPEG photographs for training. File names are id.jpg.
- test_images/: Color JPEG photographs for testing. File names are id.jpg.
- train_segmentations/: Binary PNG masks (same resolution as the associated image) indicating bird foreground for training. File names are id.png.
- test_segmentations/: Binary PNG masks indicating bird foreground for testing. File names are id.png.
- sample_submission.csv: A valid sample submission with random labels; format described below.

Submission format
- CSV with two columns and a header: id,label
- id must exactly match those in test.csv
- label must be one of the values listed in classes.csv

Evaluation
- Macro-averaged F1 score across the 200 classes.
  • Each class’s F1 is computed from precision and recall for that class, treating it as “positive” vs all others.
  • The final score is the unweighted mean of per-class F1 across all classes.
  • If a class has no true positives and no predicted positives, its F1 is defined as 0.
This metric rewards balanced performance across all species, not only the most common ones.

Notes
- All images and masks are anonymized; file names do not reveal the class.
- The segmentation masks are optional to use, but can greatly aid data processing and modeling.
- The data are already split into train/test and cover all 200 species in the training set.