# IC Datasheet Analysis Pipeline

This project provides a multi-agent LMM framefork for IC footprint geometry labeling. It automates the process of locating, classifying, planning, and extracting key parameters from the Suggest Pad diagrams within datasheet page images.

The pipeline operates in three distinct stages:
1.  **Stage 1: Suggest Pad Localization**: Utilizes a fusion of the Diagram Agent and a YOLO object detector to accurately identify the bounding box of the land pattern diagram.
2.  **Stage 2: Footprint Classification & Planning**: Classifies the IC footprint type (e.g., 2-sides, 4-sides) and devises a strategic plan for which parameters to extract.
3.  **Stage 3: Parameter Extraction**: Executes the plan from Stage 2 to extract the specific numerical parameters from the localized diagram.
4.  **Stage 4: Pin Description Generation**: Call specific description generating tool to transform key parameters into per-pin geometry descriptions (i, e., coodinates and dimensions).

## Requirements

*   Python 3.8+
*   PyTorch (CUDA recommended for GPU acceleration)
*   Transformers
*   Ultralytics YOLOv8
*   Pillow (PIL)
*   tqdm (optional, for progress bars)
*   Qwen2-VL-7B-INSTRUCT

You can install the required Python packages using pip:
```bash
pip install torch transformers ultralytics pillow tqdm
```
```
## Usage
The pipeline is executed from the command line. You must provide paths to your models, data file, and image directory.
### Command
Here is a full example command to run the pipeline:

```Bash
python main_pipeline.py \
    --image_dir ./images/ \
    --json_file ./data/image_list.json \
    --model_path1 ./models/stage1_vlm/ \
    --model_path2 ./models/stage2_vlm/ \
    --model_path3 ./models/stage3_vlm/ \
    --yolo_model_path ./models/yolo_model.pt \
    --out_stage1 stage1_predictions.jsonl \
    --out_stage2 stage2_predictions.jsonl \
    --out_stage3 stage3_predictions.jsonl
```
### Arguments

* --image_dir: Path to the directory containing datasheet images.
* --json_file: Path to the JSON file that lists the images to be processed.
* --model_path1: Path to the directory of the Stage 1 Vision Language Model.
* --model_path2: Path to the directory of the Stage 2 Vision Language Model.
* --model_path3: Path to the directory of the Stage 3 Vision Language Model.
* --yolo_model_path: Path to the .pt file for the fine-tuned YOLO model.
* --out_stage1: (Optional) Output file name for Stage 1 results. Defaults to stage1_output.jsonl.
* --out_stage2: (Optional) Output file name for Stage 2 results. Defaults to stage2_output.jsonl.
* --out_stage3: (Optional) Output file name for Stage 3 results. Defaults to stage3_output.jsonl.
### Optional Flags for Debugging
* --skip_yolo: Run Stage 1 using only the VLM, without YOLO fusion. The --yolo_model_path argument is not required if this flag is used.
* --max_images N: Process only the first N images from the JSON file. Useful for quick tests.

## Output
The script will generate three JSONL files, one for each stage of the pipeline:
1. stage1_predictions.jsonl: Contains the located bounding box for each image.
2. stage2_predictions.jsonl: Contains the IC classification and the parameter extraction plan for each image.
3. stage3_predictions.jsonl: Contains the final extracted numerical parameters for each image.
Each line in these files is a JSON object containing the input prompt and the model's prediction for a given image.
stage3_predictions.jsonl is then inputted into tool_call.py for detailed pin description generation.