# SYNTHIA: A Multi-Agent LLM Framework for Statistically Driven Synthetic Data Generation

 SYNTHetic Intelligence Architecture (SYNTHIA) is a multi-agent, GAN-inspired framework for high-fidelity **synthetic tabular data generation** using large language models (LLMs) with adversarial refinement and statistical validation. It can produce realistic, diverse, and privacy-conscious synthetic datasets suitable for domains such as healthcare, finance, and behavioral analytics.

---

## Installation

Clone the repository and install the required dependencies:

```bash
git clone <your-repo-url>
cd SYNTHIA
pip install -r requirements.txt
```

---

## Environment Setup

Create a `.env` file in the root directory to define environment variables:

```bash
touch .env
echo "PATH_NAME=your_dataset_path" >> .env
echo "OPENAI_API_KEY=your_openai_key" >> .env  # Only needed if using OpenAI models
```

Or manually edit:

```bash
nano .env
```

---

## Running SYNTHIA Core

After installation and environment setup, run the SYNTHIA pipeline:

```bash
python main.py
```

This will execute the full workflow:

1. Metadata extraction and regex schema formation  
2. Generator LLM prompt construction and row synthesis  
3. Statistical analyzer evaluation  
4. Discriminator feedback and iterative refinement (GAN loop)  

---

## Running the SYNTHIA UI

SYNTHIA includes a **web-based interface** for uploading datasets and visualizing synthetic data generation:

```bash
cd UI
cd file-uploader
npm install
npm start
```

Then navigate to the local URL provided (typically `http://localhost:3000`) to interact with the interface.

---

## Minimum System Requirements

- **OS**: Linux/macOS (Windows supported with WSL for local models)  
- **RAM**: 16 GB minimum (32 GB recommended)  
- **VRAM**: 8 GB+ recommended for GPU inference  
- **Python**: 3.8+  
- **Node.js**: v16+ for the UI  

### Example Compatible Hardware
- **GPUs**: NVIDIA RTX 3060 / A4000+ or AMD RX 6800+  
- **CPUs**: Intel i7-9700K+ or AMD Ryzen 7 3700X+  

> GPU acceleration is strongly recommended for real-time or large-scale synthetic generation.
