# A11yn: Aligning LLMs for Accessible Web UI Code Generation

A11yn (pronounced "uh-line") is the first reinforcement learning-based framework that aligns code-generating large language models (LLMs) with accessibility standards to produce WCAG compliant, visually appealing web UI code. Using a custom reward function crafted with WCAG testing via axe-core and Group-Relative Policy Optimization (GRPO), A11yn optimizes code generation for accessibility.

## Project Structure

```
A11yn/
├── A11yn_train.py          # Main training script
├── a11yn_train.sh          # Training launch script
├── accessibility_reward.py  # Accessibility reward functions
├── axe.min.js              # axe-core accessibility testing library
├── data/
│   ├── UIReq6.8K/                  # Training datasets
│   └── RealUIReq300/                   # Evaluation datasets
├── weight/
│   └── A11yn_ckpt/            # Model checkpoints
├── deepspeed_zero3.yaml        # DeepSpeed configuration
├── requirements.txt            # Python dependencies
└── accelerate_config.yaml      # Accelerate configuration
```

## Installation

1. **Clone the repository**
   ```bash
   git clone <repository-url>
   cd A11yn
   ```

2. **Install dependencies**
   ```bash
   pip install -r requirements.txt
   ```

3. **Install Playwright browsers**
   ```bash
   playwright install chromium
   ```

4. **Set up Hugging Face authentication**
   ```bash
   # Add your Hugging Face token to A11yn_train.py
   huggingface_token = "your_token_here"
   ```

## Usage

### Training
`Qwen/Qwen2.5-Coder-7B-Instruct` is used as base model for training. More detailed training configurations are in `a11yn_train.sh` file.
Run the training script using the provided shell script:

```bash
bash a11yn_train.sh
```

### Checkpoints
The trained checkpoints are in `weight/A11yn_ckpt` directory.
Use the following prompt for web UI code generation task. `{user_request}` is input for query.
```bash
You are an expert UI designer assistant.
You should plan the design based on the user request. 
Show the plan in the `<think>` tag.
    - You must think about the html structure and widgets needed to fulfill the user request.
    - You must think about the Tailwind CSS classes to use for styling.
Then, you should generate a complete HTML document that includes:
    - A `<head>` section with a `<meta charset="UTF-8">` tag
    - A `<meta name="viewport" content="width=device-width, initial-scale=1.0">` tag  
    - A proper tailwind css link tag to load Tailwind CSS from CDN
    - A `<body>` section that contains the complete HTML structure and content
The HTML document should be visually appealing, well-structured, and content/semantically-rich.
You must strictly follow the output format shown below:
<think>
...
</think>

<answer>
<html>
<head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    ...
</head>
<body>
...
</body>
</html>
</answer>

User: {user_request}
Assistant:
```

***Important Note***: Some generated HTML outputs may include deprecated or non-functional Tailwind CSS CDN links.
To ensure proper rendering, replace any outdated Tailwind CDN with the following:
```html
<script src="https://cdn.jsdelivr.net/npm/@tailwindcss/browser@4"></script>
```

## Method Overview

A11yn optimizes a code generating language model to natively generate accessible HTML code through a post-training alignment process. Instead of relying on hand-crafted prompts or iterative fixes, A11yn directly incorporates accessibility as a learnable objective by leveraging reinforcement learning. The model is trained to generate compliant UI code using a custom reward signal based on WCAG violation severity implemented in `src/accessibility_reward.py`, and fine-tuned using Group-Relative Policy Optimization (GRPO) (`GRPOTrainer`), which provides stable, critic-free policy updates.

### Accessibility Reward

Implemented in `src/accessibility_reward.py`, the reward function runs as follows:

#### Reward Computation

- Axe-core accessibility testing file `axe.min.js` is injected into a Chromium headless browser using Playwright (browser testing tool).
- WCAG (Web Contents Accessibility Guidlines) violations are detected and grouped by severity.
- Penalty is computed based on the number and impact of violations.
```bash
impact_weights = {
    'minor':    0.1,
    'moderate': 0.2,
    'serious':  0.3,
    'critical': 0.4
}
```
- Final Reward is computed
```bash
penalty = sum(severity_weight * #violations)
reward = max(0.0, 2.0 - penalty)
```

## Dataset

- `uireq6800.json` in `data/UIReq6.8K` is the 6800 query set used for training.

- `realuirequest300.json` in `data/RealUIReq300` directory has the evaluation query sets, which are grounded from 300 real-world web UI use cases.

## Monitoring

The project integrates with Weights & Biases (wandb) for experiment tracking:
- Metrics like Reward, KL, and Loss
- Generated completions logging

## Requirements

- Python 3.10.16 (or compatible version)
- CUDA-compatible GPU for training
- Node.js (for axe-core)
- Sufficient memory for 7B parameter model

## Acknowledgments

- Built on Hugging Face Transformers and `GRPOTrainer` in TRL
- Uses Qwen2.5-Coder-7B-Instruct as base model
- Accessibility testing powered by axe-core
- Training acceleration via DeepSpeed
- The License of axe-core engine is available in `axe.min.js` file.
