# Models Release 8.5

This directory contains the core implementation files for physical property estimation from videos

## File Descriptions

### Core Model Files

- **`ddpm3d_vjepa.py`**: Contains the Dynamicrafter model architecture and VJEPA model structure for physics estimation.

- **`gpt_physics_analyzer.py`**: Implements the GPT physics analyzer class that uses multimodal large language models (MLLMs) to perform physical property estimation from video frames.

- **`run_gpt_physics_tests.py`**: Main entry point for running MLLM-based physics analysis tests with various configurations and evaluation metrics.

- **`webvid.py`**: Dataset classes for loading, processing, and handling video data with physics annotations and ground truth labels.

- **`vjepa2_unet.py`**: VJEPA pre-trained encoder implementation that provides video feature extraction capabilities.

- **`openaimodel3d.py`**: OpenAI model integration for UNet feature extraction.

## Usage Example

```bash
# Run physics analysis with GPT-4o model
python run_gpt_physics_tests.py \
    --model-version gpt-4o \
    --absolute-samples 100 \
    --relative-pairs 100 \
    --output-dir gpt_physics_analysis_results

# Run with different MLLM model
python run_gpt_physics_tests.py \
    --model-version gemini-2.5-pro \
    --absolute-samples 100 \
    --relative-pairs 100 \
    --frame-size 640 480
```

# Environment Setup

## Installation

1. Create a new conda environment:
```bash
conda create -n your_own_env python=3.9
conda activate your_own_env
```

2. Install dependencies:
```bash
pip install -r requirements.txt
```

## Key Dependencies

- **PyTorch 2.7.0**: Deep learning framework
- **Transformers**: Hugging Face transformers library for MLLMs
- **OpenCV 4.11.0**: Computer vision library
- **Decord 0.6.0**: Video processing library
- **Kornia 0.8.1**: Computer vision library for PyTorch
- **OpenAI API**: For GPT model access
- **Google API**: For Gemini model access
