# README: LLM Strategic Reasoning Evaluation

## Project Overview
This project evaluates the strategic reasoning capabilities of Large Language Models (LLMs) through behavioral game theory frameworks. It consists of three main components:

1. Parameter Estimation (Python): Estimates reasoning depth and decision-making stochasticity using strategic games.
2. Demographic Feature Evaluation (Stata): Assesses how demographic embeddings influence reasoning using regression analysis.

---
## 1. Parameter Estimation (Python)
### Description
This step estimates reasoning parameters based on observed behavior in payoff matrix games. The model assumes bounded rationality, where decisions are made probabilistically rather than purely optimally. A log-likelihood optimization process is used to fit the model to observed data.

### Files & Code
- `parameter_estimation.py`
  - Defines functions to compute expected utilities and probability distributions for different reasoning levels.
  - Implements log-likelihood maximization to estimate reasoning parameters.
  - Runs estimation for multiple strategic games.

- Key Functions
  - `level_k_choice_probabilities()`: Computes decision probabilities based on reasoning depth.
  - `calculate_expected_utilities()`: Computes expected payoffs based on opponent choices.
  - `aggregate_choice_probabilities()`: Aggregates reasoning probabilities across different levels.
  - `log_likelihood()`: Defines the log-likelihood function for optimization.
  - `estimate_parameters_for_game()`: Runs the full estimation pipeline for each game.

### Input & Output
- Input:
  - Payoff matrices (stored as NumPy arrays).
  - Observed player decisions (loaded from CSV files).

- Output:
  - CSV file (`tqre_estimation_results.csv`) containing: Estimated reasoning depth (tau)



2. Demographic Feature Evaluation (Stata)
Description
This component examines the impact of demographic attributes on LLM decision-making. The analysis involves regression models that assess whether age, gender, education, race, and other factors influence reasoning depth.

Analysis Steps
Prepare Data:

Load demographic dataset (all_demographic.dta).
Generate dummy variables for categorical features (e.g., age groups, gender, education, political affiliation).
Run Regression Models:

Perform OLS regressions to analyze the impact of demographic variables on estimated reasoning depth (tau).
Separate models for baseline and CoT-enhanced reasoning.
Visualize Results:

Generate coefficient plots to illustrate demographic influences on decision-making.
Files & Code
demographic_analysis.do

Loads demographic dataset.
Generates dummy variables.
Runs regression analysis for each LLM.
Produces coefficient plots.


Input & Output
Input:

all_demographic.dta: The dataset containing demographic attributes and estimated reasoning scores.
Output:

Regression results stored in:
regression_results.csv
regression_results_cot.csv (for models with CoT)
Plots visualizing demographic effects on reasoning.




