# Aesthetic Predictor V2.5

Aesthetic Predictor V2.5 is a SigLIP-based predictor that evaluates the aesthetics of an image on a scale from 1 to 10.

Compared to [Aesthetic Predictor V2](https://github.com/christophschuhmann/improved-aesthetic-predictor), it has been improved to evaluate a wider range of image domains such as illustrations.

<p align="center">
  <img src="./assets/example.png" width=75%>
</p>

Unlike V2, **_5.5+_** is considered to be a great aesthetic score.

**You can try Aesthetic Predictor V2.5 at Hugging Face Spaces!**

[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/discus0434/aesthetic-predictor-v2-5)

## Installation

```bash
pip install aesthetic-predictor-v2-5
```

## Usage

This repository features an interface similar to Hugging Face Transformers, almost same as [Simple Aesthetics Predictor](https://pypi.org/project/simple-aesthetics-predictor/), making it easy to use.

```python
from pathlib import Path

import torch
from aesthetic_predictor_v2_5 import convert_v2_5_from_siglip
from PIL import Image

SAMPLE_IMAGE_PATH = Path("path/to/image")

# load model and preprocessor
model, preprocessor = convert_v2_5_from_siglip(
    low_cpu_mem_usage=True,
    trust_remote_code=True,
)
model = model.to(torch.bfloat16).cuda()

# load image to evaluate
image = Image.open(SAMPLE_IMAGE_PATH).convert("RGB")

# preprocess image
pixel_values = (
    preprocessor(images=image, return_tensors="pt")
    .pixel_values.to(torch.bfloat16)
    .cuda()
)

# predict aesthetic score
with torch.inference_mode():
    score = model(pixel_values).logits.squeeze().float().cpu().numpy()

# print result
print(f"Aesthetics score: {score:.2f}")
```

With ComfyUI, you can use [this custom node](https://github.com/discus0434/comfyui-aesthetic-predictor-v2-5).

## Batch Inference

For batch processing of multiple images or videos, use the `batch_inference.py` script:

### Image Processing

```bash
# Process images from a directory
python batch_inference.py --image_dir ./images --output_dir ./outputs

# Process images from a file list
python batch_inference.py --image_list image_list.txt --output_dir ./outputs

# Specify image extensions
python batch_inference.py --image_dir ./images --output_dir ./outputs --extensions jpg png jpeg
```

### Video Processing

The script can evaluate video aesthetics by extracting keyframes and aggregating their scores:

```bash
# Process videos from a directory
python batch_inference.py --video_dir ./videos --output_dir ./outputs

# Process videos from a file list
python batch_inference.py --video_list video_list.txt --output_dir ./outputs

# Extract more keyframes per video
python batch_inference.py --video_dir ./videos --output_dir ./outputs --num_frames 20

# Extract keyframes at specific time intervals (e.g., every 2 seconds)
python batch_inference.py --video_dir ./videos --output_dir ./outputs --num_frames 20 --frame_interval 2.0 --frame_extraction_method interval
```

**Video Processing Method:**
- Extracts keyframes from each video (uniformly distributed or at fixed time intervals)
- Evaluates aesthetic score for each keyframe
- Aggregates scores (mean) to get the overall video aesthetic score
- Provides detailed statistics including min/max/std of frame scores

### Output Files

The script will generate:
- `results.json`: Detailed results for each image/video with aesthetic scores
- `metrics.json`: Statistical summary (mean, min, max, std)
- `failed_images.txt` or `failed_videos.txt`: List of files that failed to process (if any)

For videos, each result includes:
- Overall aesthetic score (mean of keyframe scores)
- Individual keyframe scores
- Min/max/std of frame scores
- Number of keyframes extracted

### Batch Inference Options

**Image Options:**
- `--image_dir`: Directory containing images to process
- `--image_list`: Text file with image paths (one per line)
- `--extensions`: Image file extensions to process (default: jpg jpeg png bmp webp)

**Video Options:**
- `--video_dir`: Directory containing videos to process
- `--video_list`: Text file with video paths (one per line)
- `--video_extensions`: Video file extensions to process (default: mp4 avi mov mkv flv webm)
- `--num_frames`: Number of keyframes to extract per video (default: 10)
- `--frame_interval`: Time interval in seconds between keyframes (if specified, uses interval method)
- `--frame_extraction_method`: Extraction method - 'uniform' (evenly spaced) or 'interval' (time-based, default: uniform)

**Common Options:**
- `--output_dir`: Output directory for results (required)
- `--device`: Device to use (cuda/cpu, default: auto-detect)
- `--no_save`: Don't save results to files (only print to console)

**Note:** Video processing requires `opencv-python` to be installed:
```bash
pip install opencv-python
```
