Abstract: Document summarization facilitates efficient identification and assimilation of user-relevant content, a process inherently influenced by individual subjectivity. Discerning $\textit{subjective}$ salient information within a document, particularly when it has multiple facets, poses significant challenges. This complexity underscores the necessity for $\textit{personalized summarization}$. However, training models for personalized summarization has so far been challenging, particularly because diverse training data containing both user preference history (i.e., $\textit{click-skip}$ trajectory) and expected (gold-reference) summaries are scarce. The MS/CAS PENS dataset is a rare resource in this direction. However, the training data only contains preference history $\textit{without any target summaries}$, thereby blocking end-to-end supervised learning. Also, the diversity in terms of topic transitions along the trajectory is relatively low, thereby leaving scope for better generalization. To address this, we first introduce a novel user preference data diversity evaluation metric, called DegreeD. We then propose PerAugy, a novel $\text{cross-trajectory shuffling}$ and $\text{summary-content perturbation}$-based data augmentation technique that increases the DegreeD-score and thereby, significantly boosts the accuracy of four state-of-the-art (SOTA) baseline user-encoders commonly used in personalized summarization frameworks (\text{best result}: $\text{0.132}$$\uparrow$ w.r.t AUC). We select two such SOTA summarizer frameworks as baselines and observe that when augmented with their corresponding improved user-encoders, they consistently show an increase in personalization ($\text{avg. boost}$: ${61.2\%}\uparrow$ w.r.t. PSE-SU4 metric). This further establishes the efficacy of PerAugy as an augmentation method to boost personalized summarizers.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=GYNFv93NbJ
Changes Since Last Submission: We are resubmitting the manuscript as per the templates. There were some conflicting packages in the earlier submission that were not removed. No other changes were made (footnote content from abstract has been shifted to introduction).
Best,
Authors
Assigned Action Editor: ~Branislav_Kveton1
Submission Number: 5306
Loading