# KARMA: Knowledge-Aware Reward Mechanism Adjustment via Causal AI

This repository contains the implementation of the KARMA framework for enhancing reinforcement learning through dynamic reward adjustment based on domain knowledge and causal AI.

## Installation

### Prerequisites
- Python 3.8 or higher
- PyTorch 1.10.0 or higher
- CUDA (optional, for GPU acceleration)

### Install from source
```bash
git clone <repository-url>
cd karma-rl
pip install -e .
```

### Install dependencies only
```bash
pip install -r requirements.txt
```

## Quick Start

### Basic Usage

```python
from karma import KARMAAgent, KARMATrainer
import gym

# Configuration
config = {
    'state_dim': 64,
    'embedding_dim': 50,
    'hidden_dim': 32,
    'buffer_size': 100000,
    'min_buffer_size': 10000,
    'causal_update_frequency': 1000,
    'knowledge_update_frequency': 1000,
    'reward_adjustment': {
        'initial_wk0': 0.3,
        'initial_wc0': 0.7,
        'knowledge_decay_lambda': 0.0001,
        'causal_growth_lambda': 0.0001
    }
}

# Initialize agent
agent = KARMAAgent(config)

# Initialize environment and base RL algorithm
env = gym.make('CartPole-v1')
base_rl_algorithm = None  # Replace with your preferred RL algorithm

# Initialize trainer
trainer = KARMATrainer(agent, base_rl_algorithm, env)

# Train
trainer.train(num_episodes=5000)
```

### With Knowledge Graph

```python
# Define knowledge graph
knowledge_graph = {
    'entities': ['State', 'Action', 'Reward', 'Goal'],
    'relations': ['causes', 'leads_to', 'increases'],
    'triples': [
        ('Action', 'causes', 'State'),
        ('State', 'leads_to', 'Reward'),
        ('Goal', 'increases', 'Reward')
    ]
}

config['knowledge_graph'] = knowledge_graph
agent = KARMAAgent(config)
```

## Architecture

The KARMA framework consists of four main components:

### 1. Knowledge Representation (`knowledge_representation.py`)
- **TransE**: Knowledge graph embedding model
- **KnowledgeIntegrator**: Attention-based knowledge-state integration

### 2. Causal Learning (`causal_learning.py`)
- **KnowledgeConstrainedPC**: Modified PC algorithm with knowledge constraints
- **SCMBuilder**: Structural causal model construction

### 3. Reward Adjustment (`reward_adjustment.py`)
- **compute_knowledge_reward**: Knowledge-based reward computation
- **compute_causal_reward**: Counterfactual reward computation
- **combine_rewards_dynamically**: Dynamic reward combination

### 4. KARMA Agent (`karma_agent.py`)
- **KARMAAgent**: Main agent class integrating all components
- **KARMATrainer**: Training loop with base RL algorithm integration

## Configuration

### Agent Configuration
```python
config = {
    # Model dimensions
    'state_dim': 64,           # State representation dimension
    'embedding_dim': 50,       # Knowledge embedding dimension
    'hidden_dim': 32,          # Hidden layer dimension
    
    # Buffer settings
    'buffer_size': 100000,     # Experience buffer size
    'min_buffer_size': 10000,  # Minimum buffer size before adjustment
    'batch_size': 32,          # Training batch size
    
    # Update frequencies
    'causal_update_frequency': 1000,     # Episodes between causal model updates
    'knowledge_update_frequency': 1000,  # Episodes between knowledge updates
    
    # Reward adjustment parameters
    'reward_adjustment': {
        'initial_wk0': 0.3,                    # Initial knowledge weight
        'initial_wc0': 0.7,                    # Initial causal weight
        'knowledge_decay_lambda': 0.0001,      # Knowledge weight decay rate
        'causal_growth_lambda': 0.0001,        # Causal weight growth rate
        'reward_min': -10.0,                   # Minimum reward value
        'reward_max': 10.0                     # Maximum reward value
    }
}
```

### Knowledge Graph Format
```python
knowledge_graph = {
    'entities': ['Entity1', 'Entity2', ...],
    'relations': ['relation1', 'relation2', ...],
    'triples': [
        ('head_entity', 'relation', 'tail_entity'),
        ...
    ]
}
```

## Examples

### GridWorld Environment
```python
from karma import KARMAAgent
from environments.gridworld import GridWorldEnv

# Create environment with causal interference
env = GridWorldEnv(size=10, causal_features=True)

# Define domain knowledge
knowledge_graph = {
    'entities': ['EnergySource', 'ShinyRock', 'Goal', 'Agent'],
    'relations': ['increases', 'correlates_with', 'leads_to'],
    'triples': [
        ('EnergySource', 'increases', 'Reward'),
        ('ShinyRock', 'correlates_with', 'EnergySource'),
        ('Goal', 'increases', 'Reward')
    ]
}

config = {
    'state_dim': env.observation_space.shape[0],
    'num_actions': env.action_space.n,
    'knowledge_graph': knowledge_graph
}

agent = KARMAAgent(config)
```

### Robot Manipulation
```python
from karma import KARMAAgent
from environments.robot_env import RobotManipulationEnv

# Create robot environment
env = RobotManipulationEnv()

# Define manipulation knowledge
knowledge_graph = {
    'entities': ['Cube', 'Cylinder', 'SideGrasp', 'TopGrasp', 'Stable'],
    'relations': ['best_grasp', 'leads_to', 'property_of'],
    'triples': [
        ('Cylinder', 'best_grasp', 'SideGrasp'),
        ('Cube', 'best_grasp', 'TopGrasp'),
        ('SideGrasp', 'leads_to', 'Stable')
    ]
}

config = {
    'state_dim': env.observation_space.shape[0],
    'knowledge_graph': knowledge_graph,
    'reward_adjustment': {
        'initial_wk0': 0.5,  # Higher knowledge weight for manipulation
        'initial_wc0': 0.5
    }
}

agent = KARMAAgent(config)
```

## Testing

Run the test suite:
```bash
python -m pytest tests/
```

Run specific tests:
```bash
python -m pytest tests/test_knowledge_representation.py
python -m pytest tests/test_causal_learning.py
python -m pytest tests/test_reward_adjustment.py
```

## Performance Monitoring

The framework supports integration with popular monitoring tools:

### TensorBoard
```python
from torch.utils.tensorboard import SummaryWriter

writer = SummaryWriter('runs/karma_experiment')
# Log metrics during training
writer.add_scalar('Reward/Episode', episode_reward, episode)
writer.add_scalar('Knowledge_Weight', knowledge_weight, episode)
writer.add_scalar('Causal_Weight', causal_weight, episode)
```

### Weights & Biases
```python
import wandb

wandb.init(project="karma-rl")
wandb.log({
    "episode_reward": episode_reward,
    "knowledge_weight": knowledge_weight,
    "causal_weight": causal_weight
})
```

## Troubleshooting

### Common Issues

1. **Memory Issues**: Reduce `buffer_size` or `batch_size` if running out of memory
2. **Slow Training**: Increase `causal_update_frequency` and `knowledge_update_frequency`
3. **Poor Performance**: Check knowledge graph quality and adjust reward weights

### Debug Mode
```python
config['debug'] = True  # Enable debug logging
agent = KARMAAgent(config)
```

## Citation

If you use this code in your research, please cite:

```bibtex
@article{karma,
    title={KARMA: Knowledge-Aware Reward Mechanism Adjustment via Causal Inference},
    author={Anonymous},
    journal={AAAI Conference on Artificial Intelligence},
    year={2026}
}
```

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Acknowledgments

- The causal discovery implementation is based on the PC algorithm from pgmpy
- Knowledge graph embeddings use the TransE model
- Base RL algorithms can be integrated from stable-baselines3

