Abstract: Analysis to gain new knowledge from huge amounts of data is called data science, and its widespread use is now socially important. Feature engineering, the process of extracting features from data, is one of the main tasks in data science, and since this task relies on the experience of experts, research is being conducted to automate it. In this paper, we propose a novel approach to automate feature identification from textual information in data column names. Specifically, we use techniques of natural language processing and source code analysis, for data descriptions and source codes in Python notebooks to create a knowledge database with a particular focus on datetime features. We develop a recommendation system of datetime features for newly given text information based on that knowledge database. In experiments, we confirmed the classification accuracy of the knowledge database, applied the database to actual forecasting tasks such as home price forecasting, and achieved 5.76% on average for accuracy gain.
Loading