# Updated Reformat Scripts Usage Guide

## 📁 Location
The scripts are now located in: `schema_induction_pipeline_copy/evaluation/test_inference/reformat/`

## 🚀 Updated Features

### ✅ **New Argument Handling**
- **`--input` / `-i`**: Specify custom input path
- **`--output` / `-o`**: Specify custom output path (optional)
- **Smart defaults**: Automatically saves output in the same directory as input

### ✅ **Automatic Output Naming**
- **Simple script**: `{input_name}_reformatted.parquet`
- **Comprehensive script**: `{input_name}_{mode}.parquet` (e.g., `high_level_codes_basic.parquet`)

---

## 📝 Simple Script Usage

### **Basic Usage**
```bash
# Use default input file in current directory
python reformat_high_level_codes.py

# Specify custom input path (output saved in same directory)
python reformat_high_level_codes.py --input /path/to/high_level_codes.parquet

# Specify both input and output paths
python reformat_high_level_codes.py --input /path/to/input.parquet --output /path/to/output.parquet
```

### **Examples**
```bash
# Reformat the original file
python reformat_high_level_codes.py --input ../../../result_storage/aliabdaal/Q1/iteration_01/high_level_codes/high_level_codes.parquet

# This creates: high_level_codes_reformatted.parquet in the same directory
```

---

## 🚀 Comprehensive Script Usage

### **Basic Usage**
```bash
# Analyze structure only
python comprehensive_reformat_script.py --input /path/to/file.parquet --analyze-only

# Basic reformatting (default mode)
python comprehensive_reformat_script.py --input /path/to/file.parquet

# Different modes
python comprehensive_reformat_script.py --input /path/to/file.parquet --mode minimal
python comprehensive_reformat_script.py --input /path/to/file.parquet --mode expanded
```

### **Examples**
```bash
# Analyze the original file structure
python comprehensive_reformat_script.py --input ../../../result_storage/aliabdaal/Q1/iteration_01/high_level_codes/high_level_codes.parquet --analyze-only

# Basic reformatting (creates: high_level_codes_basic.parquet)
python comprehensive_reformat_script.py --input ../../../result_storage/aliabdaal/Q1/iteration_01/high_level_codes/high_level_codes.parquet --mode basic

# Minimal reformatting (creates: high_level_codes_minimal.parquet)
python comprehensive_reformat_script.py --input ../../../result_storage/aliabdaal/Q1/iteration_01/high_level_codes/high_level_codes.parquet --mode minimal
```

---

## 📊 Output Comparison

| Script | Mode | Output File | Columns | Use Case |
|--------|------|-------------|---------|----------|
| Simple | - | `{input}_reformatted.parquet` | 4 (tag, cluster_id, source_codes, num_source_codes) | Quick fix |
| Comprehensive | basic | `{input}_basic.parquet` | 4 (tag, cluster_id, source_codes, num_source_codes) | Standard use |
| Comprehensive | minimal | `{input}_minimal.parquet` | 2 (tag, cluster_id) | Minimal footprint |
| Comprehensive | expanded | `{input}_expanded.parquet` | 5 (tag + all original) | Maximum data |

---

## 🎯 **Recommended Usage**

### **For Quick Fixes**
```bash
python reformat_high_level_codes.py --input /path/to/high_level_codes.parquet
```

### **For Production Use**
```bash
# First analyze the structure
python comprehensive_reformat_script.py --input /path/to/high_level_codes.parquet --analyze-only

# Then reformat with your preferred mode
python comprehensive_reformat_script.py --input /path/to/high_level_codes.parquet --mode basic
```

---

## ✅ **Validation**

All reformatted files are automatically validated to ensure:
- ✅ Contains the required `'tag'` column
- ✅ Can be read by pandas
- ✅ Resolves the original `KeyError: 'tag'` issue
- ✅ Maintains data integrity

---

## 🔧 **Error Handling**

The scripts now provide better error messages:
- Clear indication when input file is not found
- Helpful suggestions for correct usage
- Automatic directory creation for output files
- Validation of output files

---

## 📋 **Quick Reference**

| Command | Purpose |
|---------|---------|
| `--input` / `-i` | Specify input file path |
| `--output` / `-o` | Specify output file path (optional) |
| `--mode` | Choose reformatting mode (comprehensive script only) |
| `--analyze-only` | Analyze structure without reformatting (comprehensive script only) |
| `--help` | Show help and examples |

