
## Dataset Overview

This appendix includes three example subset of our RedNote-Vibe dataset:

- **Training Set (Human)**: 1,000 human samples from pre-GPT era (before 2022.12).
- **Training Set (AI)**: 1,000 samples generated by AI models.
- **Exploration Set**: 1,000 samples posted from post-GPT era (after 2022.12) with extracted features using our PLAD framwork.

## File Details

### 1. training_set_human.jsonl

#### Field Description
| Field Name | Type | Description |
|------------|------|-------------|
| `note_id` | string | Unique identifier for the note |
| `note_title` | string | Note title |
| `local_time` | string | Local timestamp (format: YYYYMMDDHH) |
| `desc` | string | Note text content body |
| `liked_count` | integer | Number of likes |
| `collected_count` | integer | Number of collections |
| `comments_count` | integer | Number of comments |
| `domain` | string | Content domain classification |

#### Sample Data
```json
{
  "note_id": "09f0f5c483a0c1e9e9b3d3080271c5e1",
  "note_title": "伦敦日记Dec.15：致敬阿圭罗",
  "local_time": "2021121608",
  "desc": "下课发现整个朋友圈喜欢足球的朋友都发了和阿圭罗相关的动态，打开一看居然是官宣退役的新闻。我一直是阿根廷的球迷，2010年世界杯时我看了人生中第一场足球比赛，那年阿根廷发挥的并不理想，但我还是因为梅西喜欢上了足球，也喜欢上了总与他并肩的阿圭罗和那支阿根廷队。11年过去了，很多我曾经熟悉的面孔已经从球员变成了球迷，但是那些画面依然鲜活并深刻地保留在记忆中，从2014年世界杯憾负德国到2021年捧起美洲杯冠军，足球陪伴了我的整个青春，或许我喜欢的不光是足球，更是足球陪我走过的时光。",
  "liked_count": 38,
  "collected_count": 3,
  "comments_count": 10,
  "domain": "运动"
}
```

### 2. training_set_ai.jsonl


#### Field Description
| Field Name | Type | Description |
|------------|------|-------------|
| `title` | string | Title of the generated post |
| `content` | string | Generated text content |
| `model_family` | string | AI model family (e.g., anthropic, gpt, deepseek, glm) |
| `model` | string | Specific AI model name (e.g., claude-sonnet-4, gpt-4.1, deepseek-r1-250120, glm-4) |
| `domain` | string | Content domain classification |

#### Sample Data
```json
{
  "title": "新疆自驾游超全装备清单！踩过雷的姐妹来分享经验啦✨",
  "content": "刚从新疆回来，必须跟大家分享一下这次自驾游的装备心得！有些东西真的超有用，有些完全是累赘，给大家避避雷～\n\n🧳 服装搭配篇：温差真的太大了！\n\n日常穿搭：白天真的热到爆，短袖短裤是标配，但一定要选吸汗透气的面料！我带的那件聚酯纤维T恤简直是灾难...\n\n防晒装备：防晒衣真的是救命神器！新疆的太阳不是开玩笑的，我第一天没涂防晒霜，晚上就红得像龙虾。宽边帽子和防晒袖套也要安排上！",
  "model_family": "anthropic",
  "model": "claude-sonnet-4",
  "domain": "旅行"
}
```

### 3. exploration_set_with_features.jsonl


#### Basic Field Description
| Field Name | Type | Description |
|------------|------|-------------|
| `note_id` | string | Unique identifier for the note |
| `note_title` | string | Note title |
| `time` | integer | Unix timestamp |
| `local_time` | string | Local timestamp (format: YYYYMMDDHH) |
| `desc` | string | Note content body |
| `liked_count` | integer | Number of likes |
| `collected_count` | integer | Number of collections |
| `comments_count` | integer | Number of comments |
| `domain` | string | Content domain classification |
| `user_id` | string | Unique user identifier |
| `features` | object | Extracted feature vectors |



#### Sample Data
```json
{
  "note_id": "685b74d000000000170338bf",
  "note_title": "日系才是我的舒适圈🧶",
  "time": 1750824144,
  "local_time": "2025062512",
  "desc": "日系感十足(̀⌄́)条纹真的好少年感#日系穿搭[话题]##潮流POV[话题]##flowfit[话题]##翻红吧衣橱[话题]##闪回千禧年代[话题]##时代的眼泪[话题]##boxyfit[话题]#",
  "liked_count": 91,
  "collected_count": 7,
  "comments_count": 13,
  "domain": "穿搭",
  "user_id": "5b1a65546b58b717f214ba63",
  "features": {
    "sentence_count": 1.0,
    "word_count": 30.0,
    "char_count": 99.0,
    "ttr": 0.8,
    "emotional_intensity_control": 0.3,
    "personal_emotional_grounding": 0.0,
    ...
  }
}
```

Note that the features introduced in the paper can be mapped to the feature names in the data using the following mapping:

```python
feature_name_mapping = {
    'emotional_intensity_control': 'Emotional Intensity',
    'personal_emotional_grounding': 'Personal Emotional Grounding',
    'life_detail_vividness': 'Sensory Detail Richness',
    'social_connectedness': 'Social Connectedness',
    'theory_of_mind_and_empathy': 'Empathetic Engagement',
    'interactive_and_dialogic_stance': 'Interactive and Dialogic Stance',
    'unique_emoji_ratio': 'Unique Emoji Ratio',
    'emoji_density': 'Emoji Density',
    'cognitive_complexity_and_openness': 'Perspectival Complexity',
    'narrative_structure_flexibility': 'Narrative Structure Flexibility',
    'argumentation_strength_and_nuance': 'Dialectical Argumentation Strength',
    'metacognition_and_self_correction': 'Self-Correction',
    'worldview_and_values': 'Axiological Coherence',
    'temporal_orientation_and_integration': 'Temporal Orientation and Integration',
    'sentence_count': 'Sentence Count',
    'word_count': 'Word Count',
    'char_count': 'Character Count',
    'vocabulary_and_stylistic_personalization': 'Lexical-Stylistic Personalization',
    'natural_language_flow_and_rhythm': 'Prosodic Rhythm Consistency',
    'humanized_imperfection': 'Imperfection',
    'humor_and_irony': 'Rhetorical Sophistication',
    'punct_ratio': 'Punctuation Ratio',
    'number_ratio': 'Number Ratio',
    'ttr': 'Type-Token Ratio',
    'word_freq_entropy': 'Word Frequency Entropy',
    'word_burstiness': 'Word Burstiness',
    'lexical_cohesion': 'Lexical Cohesion',
    'sentence_similarity_mean': 'Inter-Sentential Sentence Similarity',
    'immediate_repetition_ratio': 'Immediate Repetition Density',
    'phrase_repetition_ratio': 'Phrasal Repetition Frequency',
    'sentence_burstiness': 'Sentence Burstiness'
}
```
