Abstract: Happiness prediction based on large-scale online data and machine learning models is an emerging research topic that underpins a range of issues, from personal growth to social stability. Many advanced machine learning (ML) models with explanations are used for happiness online assessment while maintaining high accuracy of results. However, expert feedback and sociological theory may be absent from these models, which limits the association between prediction results and the right reasons for why they occurred. Sociological studies have shown that primary and secondary relations are inherent in happiness factors, which can be used as domain knowledge to guide model training. Inspired by such insights, this article attempts to provide new insights into the explanation consistency from an empirical study perspective. Then this article studies how to represent and introduce domain knowledge constraints to make ML models more trustworthy. We achieve this by 1) proving that multiple prediction models with additive factor attributions will have the desirable property of primary and secondary relations consistency; and 2) showing that factor relations with quantity can be represented as an importance distribution for encoding domain knowledge. Factor explanation difference is penalized by the Wasserstein distance among prediction models. Experimental results using two online datasets show that domain knowledge of stable factor relations exists. Using this knowledge not only improves happiness prediction accuracy but also reveals more significant happiness factors for assisting decisions.
External IDs:dblp:journals/tcss/WuLTXY25
Loading