Space Efficiencies in Discourse Modeling via Conditional Random Sampling

Brian Kjersten, Benjamin Van Durme

2012 (modified: 12 Nov 2022)HLT-NAACL 2012Readers: Everyone

Abstract: Recent exploratory efforts in discourse-level language modeling have relied heavily on calculating Pointwise Mutual Information (PMI), which involves significant computation when done over large collections. Prior work has required aggressive pruning or independence assumptions to compute scores on large collections. We show the method of Conditional Random Sampling, thus far an underutilized technique, to be a space-efficient means of representing the sufficient statistics in discourse that underly recent PMI-based work. This is demonstrated in the context of inducing Shankian script-like structures over news articles.

0 Replies