Keywords: Feature Learning, Shapley Value, Click-through Rate, Asset Pricing
Abstract: A machine learning system is typically composed of model and data. In many applications, feature is the input of models so as to generate a meaningful prediction. While a large amount of model-centric solutions are proposed to improve the capabilities of models, there is very limited exploration on how to discover useful feature interactions from a data-centric perspective. In this work, we propose a general framework named Feature Shapley with the purpose of discovering useful high-order feature interactions based on Feature Shapely values and thereby generating new features. Since computing exact Feature Shapley values is computationally infeasible, Monte-Carlo approximation and early truncation trick are applied for efficient estimation of Feature Shapley values in this work. Experimental results indicate that the decisive feature interactions exploited by Feature Shapley are of vital importance for the Click-through rate (CTR) prediction and asset pricing task. With decisive feature interactions exploited by Feature Shapley, even simple models (e.g., linear regression (LR) or shallow neural network) could achieve similar or even better performance comparing with more complex approaches and keep their superior interpretability at the same time.
One-sentence Summary: We present a general framework named Feature Shapley with the purpose of discovering useful feature interactions and thereby generating new features.
11 Replies
Loading