Abstract: Multimodal grammars provide an expressive formalism for multimodal integration and understanding. However, hand-crafted multimodal grammars can be brittle with respect to unexpected, erroneous, or disfluent inputs. In previous work, we have shown how the robustness of stochastic language models can be combined with the expressiveness of multimodal grammars by adding a finite-state edit machine to the multimodal language processing cascade. In this paper, we present an approach where the edits are trained from data using a noisy channel model paradigm. We evaluate this model and compare its performance against hand-crafted edit machines from our previous work in the context of a multimodal conversational system (MATCH)
0 Replies
Loading