# ProductQA dataset

## Introduction

ProductQA consists of 88,229 QA pairs in customer service, organized into 26 tasks, each linked to a specific Amazon product category with an average of 3,393 related QA pairs. This dataset is derived from real Amazon user queries and includes fact-based questions, reasoning questions, and product recommendation queries. It thoroughly assesses agents' abilities to manage historical information and accumulated knowledge, utilize tools, interact with humans, conduct self-evaluation, and engage in reflection.

## File structure

```
- test
  - all_pans
    - metadata.json
    - qa.jsonl
    - schema.json
  - ...
- train
  - blades
    - metadata.json
    - qa.jsonl
    - schema.json
  - ...
- eval.py
- LICENSE
- README.md
```

The 26 tasks in ProductQA are split into two subsets. The train set contains 20 different product categories and the test set contains 6 different product categories. Each category includes three files. The `qa.jsonl` includes all QA examples. The `schema.json` describes the feature names and the available feature values for the product category. The `metadata.json` contains the pairs of feature name and value for each product.

The `eval.py` is the evaluation script. Two parameters are required, i.e. `--input_dir` and `--output_dir`. The `--input_dir` is the directory path which must include 6 prediction files generated by your model (`${product_category}.jsonl`). The `--output_dir` is the directory path that saves the evaluation results. The `--long_eval` is an optional parameter indicating whether evaluating the long answers or not. If you want to evaluate the long answers, remember to set `OPENAI_API_KEY` since the evaluation process requires GPT-4.

## Citation

If you find it useful, please cite our work.

```

```
