Abstract: We consider Conditional random fields (CRFs) with pattern-based potentials defined on a chain. In this model the energy of a string (labeling) $x_1\ldots x_n$x1źxn is the sum of terms over intervals [i, j] where each term is non-zero only if the substring $x_i\ldots x_j$xiźxj equals a prespecified pattern w. Such CRFs can be naturally applied to many sequence tagging problems. We present efficient algorithms for the three standard inference tasks in a CRF, namely computing (i) the partition function, (ii) marginals, and (iii) computing the MAP. Their complexities are respectively $O(\textit{nL})$$O(nL), $O(\textit{nL} \ell _{\max })$O(nLlmax) and $O(\textit{nL} \min \{|D|,\log (\ell _{\max }\!+\!1)\})$O(nLmin{|D|,log(lmax+1)}) where L is the combined length of input patterns, $\ell _{\max }$lmax is the maximum length of a pattern, and D is the input alphabet. This improves on the previous algorithms of Ye et al. (NIPS, 2009) whose complexities are respectively $O(\textit{nL} |D|)$O(nL|D|), $O\left( n |\varGamma | L^2 \ell _{\max }^2\right) $On|Γ|L2lmax2 and $O(\textit{nL} |D|)$O(nL|D|), where $|\varGamma |$|Γ| is the number of input patterns. In addition, we give an efficient algorithm for sampling, and revisit the case of MAP with non-positive weights.
0 Replies
Loading