# README

## Introduction
This repository contains the implementation of the method described in the paper "Beyond Expert-Annotated Labels: An Adaptive Label Learning Method for Knowledge Tracing" (ALL4KT). It includes the source code for the ALL4KT method, datasets, and baseline methods for comparison.

## Directory Structure
- **all4kt**: This folder contains the implementation of the ALL4KT method.
  - **main.py**: The main script to run the ALL4KT method.
  - **...**: Other supporting files for the ALL4KT method.

- **dataset**: This folder contains the datasets used in the experiments.
  - **[dataset_name]**: Each dataset folder contains the following files:
    - **init_data.py**: A script to process the raw data and generate the required data files. The raw data can be downloaded from the address provided in the paper, but we have already processed the data, so it is not necessary to download it again.
    - **user.csv**: The mapping relationship between the original data users and our data users.
    - **question.csv**: The mapping relationship between the original data questions and our data questions.
    - **skill.csv**: The mapping relationship between the original data KCs and our data groups. Note that the grouping here is the default grouping of the dataset, not the final grouping output by our model.
    - **record.csv**: The interaction data, describing which user (first column) answered which question (second column) and the result (third column).
    - **question_skill.csv**: The correspondence between questions and KC groups.
    - **question_group.csv**: The optimal question grouping generated by ALL4KT, which can be used for subsequent experiments with baselines.
    - **all.pkl**: The repackaged data provided for baseline experiments, including fields such as user, question, KC, group, etc.

- **baseline**: This folder contains the implementation of the baseline methods.
  - **data_trans.py**: A script to process the `record.csv` file and generate the `all.pkl` file.
  - **data_load.py**: A script to load the `all.pkl` data.
  - **train.py**: The training script to run the baseline models. It can be used with various parameters to customize the training process.
  - **[kt.py]**: The implementation of each baseline method.

- **result**: This folder is used to save the experimental results by default.

## Datasets
The datasets used in this study are as follows:

| Dataset     | Learners | Questions | KCs/Groups | Interactions |
| ----------- | -------- | --------- | ---------- | ------------ |
| ASSIST2009  | 4,029    | 16,888    | 110/137    | 325,515      |
| ASSIST2012  | 28,118   | 53,084    | 265/265    | 2,710,913    |
| Algebra2005 | 574      | 17,2994   | 113/436    | 606,401      |
| Bridge2006  | 1,146    | 129,255   | 494/564    | 1,817,018    |

## Running the ALL4KT
To run the ALL4KT, navigate to the `all4kt` folder and execute the following command:

```bash
python main.py --data_name [dataset_name]
```

Replace `[dataset_name]` with the name of the dataset you want to use (e.g., `assist2009`).

You can also use the following optional parameters:
- `--result`: Print the final result.
- `--summary`: Print the results of each iteration.
- `--detail`: Print the intermediate results of the KT model training process in each iteration.
- `--save_clr`: Save the final grouping result with the specified filename.

## Running Baseline Methods
To run the baseline methods, use the `train.py` script in the `baseline` folder. The script allows you to specify various parameters to customize the training process. Here's an example of how to run a baseline model:

```bash
python train.py --data_name bridge2006 --model_name ukt --mode KC --max_epochs 100 --folds 5
```

### Parameters
- `--data_name`: The name of the dataset to use (e.g., `bridge2006`).
- `--model_name`: The name of the model to train (e.g., `ukt`).
- `--mode`: The mode to run the model in (e.g., `KC`).
- `--max_epochs`: The maximum number of training epochs.
- `--folds`: The number of folds for cross-validation.

### Additional Information
- The experiments use five-fold cross-validation.
- The `--mode` parameter is used to control which method to use, including the methods mentioned in the paper (e.g., `Q`, `KC`, `Ours`).

For specific parameters and usage of each baseline method, refer to the corresponding `[kt.py]` files in the `baseline` folder.

## Result Storage
The experimental results will be saved in the `result` folder by default.
