Measuring Information Distortion in Hierarchical Ultra long Novel Reconstruction: The Optimal Expansion Ratio

ACL ARR 2025 July Submission56 Authors

20 Jul 2025 (modified: 01 Sept 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: A two stage novel generation framework (outline -> section outline -> manuscript) is widely used in long novel generation,(e.g., \textsc{DOME}, \textsc{Plan\&Write}, \textsc{Long Writer}), but study of such framework in ultra long novel(>1M words) reconstruction is little. Building on recent text compression methods (\textsc{LLMZip}, \textsc{LLM2Vec}), we conduct an information-theoretic analysis to quantify semantic distortion under different compression-expansion ratios. We examine how outline length affects information preservation. Experiments on ultra long novels show that the optimal compression-expansion ratio significantly reduce semantic distortion compare to other non-optimal compression-expansion ratio.
Paper Type: Short
Research Area: Summarization
Research Area Keywords: automatic evaluation, text-to-text generation, extractive summarisation, abstractive summarisation, long-form summarization, sentence compression
Contribution Types: Data analysis
Languages Studied: Chinese, English
Submission Number: 56
Loading