# UtilGen: Utility-Centric Generative Data Augmentation with Dual-Level Task Adaptation

This repository implements **UtilGen**, a framework for utility-centric data generation guided by feedback from downstream tasks. It includes the following components:

- **TODV**: Task-Oriented Data Valuation using a meta-learned weight network  
- **MLCO**: Model-Level Generation Capability Optimization with diffusion models  
- **ILPO**: Instance-Level Generation Policy Optimization through prompt and noise adaptation  
- **classify**: Classifier training using the generated synthetic data  


We recommend using a virtual environment or conda environment to manage dependencies cleanly.

##  Dataset Structure

The dataset should be organized in the following format:

```bash
dataset/
├── train/
│   ├── class1/
│   │   ├── img1
│   │   ├── img2
│   │   └── ...
│   └── class2/
│       ├── img1
│       ├── img2
│       └── ...
├── valid/         # Used to train the weight network
│   ├── class1/
│   └── class2/
└── test/
    ├── class1/
    └── class2/
```

##  Usage Instructions

### 1. Task-Oriented Data Valuation (TODV)

```bash
cd TODV
bash train.sh
```

### 2. Textual Inversion

```bash
cd ../textual_inversion
bash run_textual_inversion.sh
```

### 3. Model-Level Generation Capability Optimization (MLCO)

```bash
cd ../MLCO
bash mlco.sh
```

### 4. Instance-Level Generation Policy Optimization (ILPO)

```bash
cd ../ILPO
bash prompt_op.sh      # Optimize prompts
bash generate.sh       # Generate synthetic images
```
### 5. Train Classifier

```bash
cd ../classify
bash train.sh      
```

##  Configuration

Each component can be configured by modifying its corresponding `.sh` file. Key configurable parameters include:

- Dataset paths
- Model architectures
- Training hyperparameters such as learning rate, batch size, and number of iterations


