Two-step validation in character-based ingredient normalization

Jun Harashima, Yoshiaki Yamada

2018 (modified: 10 Nov 2022)CEA@IJCAI 2018Readers: Everyone

Abstract: Although ingredients are important items of information in recipes, it is difficult to process them, especially for computers, because they are user-generated informal text. To normalize ingredients, we can use a character-based encoder-decoder model that takes the character sequence of an ingredient as an input and outputs its canonical form. However, the model still has two problems: The first is that the model often generates unnatural sequences as outputs. The second problem is that the generated sequences are sometimes unrelated to the original ingredient. Therefore, we propose a two-step validation to generate better normalizations. In the first validation step, we use a trie to limit the normalization candidates to existing sequences. In the second validation step, we rerank the normalization candidates based on their similarity to the original ingredient. We conducted experiments using a corpus that includes approximately 10 thousand pairs of ingredients and their canonical forms and showed that our proposed validation improved the performance of encoder-decoder models.

0 Replies