========================================
📋 Task: commonsense170k | Split: train
🔑 Keys Mapped -> Q: 'instruction', A: 'output', GT: 'None'
========================================
📊 Raw Text Length Analysis (chars)
========================================
Question (instruction):
  Mean : 422.74
  Std  : 336.46
  Min  : 115
  Max  : 4504
  Count: 170300

Output (output):
  Mean : 29.04
  Std  : 0.88
  Min  : 26
  Max  : 31
  Count: 170300

========================================
🔢 Tokenized Length Analysis (tokens)
========================================
Question Tokens:
  Mean : 110.18
  Std  : 76.67
  Min  : 41
  Max  : 1098
  Count: 170300

Answer Tokens:
  Mean : 6.94
  Std  : 0.23
  Min  : 6
  Max  : 7
  Count: 170300

Total Tokens:
  Mean : 117.12
  Std  : 76.72
  Min  : 47
  Max  : 1105
  Count: 170300

--------------------
--- Question Tokens (instruction) ---
<= 128: 133416 (78.34%)
<= 256: 149468 (87.77%)
<= 512: 170286 (99.99%)
<= 1024: 170299 (100.00%)

--- Answer Tokens (output) ---
<= 128: 170300 (100.00%)
<= 256: 170300 (100.00%)
<= 512: 170300 (100.00%)
<= 1024: 170300 (100.00%)

--- Total Tokens ---
<= 128: 130877 (76.85%)
<= 256: 147924 (86.86%)
<= 512: 170285 (99.99%)
<= 1024: 170299 (100.00%)
========================================