# ChMapData: Chinese Memory-aware Proactive Dataset

## Overview
The **Ch**inese **M**emory-**a**ware **P**roactive **Data**set (**ChMapData**) is a novel dataset proposed in the paper "Interpersonal Memory Matters: A New Task for Proactive Dialogue Utilizing Conversational History". This dataset focuses on training and evaluating models' capabilities in **proactive topic introduction** based on conversational history, supporting the memory-aware proactive dialogue framework proposed in the paper.

## Dataset Composition
The dataset contains 4 key components:

### 1. Overall_dialogue_review
- **Purpose**: End-to-end evaluation (not for training)
- **Content**:
  - Historical dialogues
  - Final day dialogue
  - Date references to historical dialogues mentioned

### 2. Callback Dialogue
- **Purpose**: Train Memory-Aware Proactive Response Generation models
- **Content**:
  - Historical dialogue from the past day with summarized topics
  - Current dialogue initiation
  - Follow-up dialogues demonstrating proactive topic guidance

### 3. Dialogue Data
- **Purpose**: Train/Evaluate Topic Summarization models
- **Content**:
  - Dialogues with corresponding topic and sub-topic annotations

### 4. Topic Rank
- **Purpose**: Train/Evaluate Topic Retrieval models
- **Content**:
  - Dialogues with candidate historical topics
  - Ground-truth annotation of the most relevant historical topic


## Key Features
- First Chinese dataset focusing on memory-aware proactive dialogue;
- Contains both training components and evaluation benchmarks;
- Supports modular evaluation of different model components in the proposed framework;
- Provides end-to-end evaluation protocol for comprehensive system assessment.