Abstract: Cascaded approach has been used for a long time to conduct sub-tasks in order to accomplish a major task. We put cascaded approach in a probabilistic framework and analyze possible reasons for cascaded errors. To reduce the occurrence of cascaded errors, we need to add a constraint when performing joint training. We suggest a pseudo Conditional Random Field (pseudo-CRF) approach that models two sub-tasks as two Conditional Random Fields (CRFs). We then present the formulation in the context of a linear chain CRF for solving problems on sequence data. In conducting joint training for a pseudo-CRF, we reuse all existing well-developed efficient inference algorithms for a linear chain CRF, which would otherwise require the use of approximate inference algorithms or simulations that involve long computational time. Our experimental results show an interesting fact that a jointly trained CRF model in a pseudo-CRF may perform worse than a separately trained CRF on a sub-task. However the overall system performance of a pseudo-CRF would outperform that of a cascaded approach. We implement the implicit constraint in the form of a soft constraint such that users can define the penalty cost for violating the constraint. In order to work on large-scale datasets, we further suggest a parallel implementation of the pseudo-CRF approach, which can be implemented on a multi-core CPU or GPU on a graphics card that supports multi-threading. Our experimental results show that it can achieve a 12 times increase in speedup.
External IDs:dblp:conf/icdm/ChanL10
Loading