Towards Measuring Predictability: To which extent data-driven approaches can extract deterministic relations from data exemplified with time series prediction and classification

TMLR Paper2739 Authors

23 May 2024 (modified: 17 Sept 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Minimizing loss functions is one important ingredient for machine learning to fit parameters such that the machine learning models extract relations hidden in the data. The smaller the loss function value on various splittings of a dataset, the better the machine learning model is assumed to perform. However, datasets are usually generated by dynamics consisting of deterministic components where relations are clearly defined and consequently learnable as well as stochastic parts where outcomes are random and thus not predictable. Depending on the amplitude of the deterministic and stochastic processes, the best achievable loss function value varies and is usually not known in real data science scenarios. In this research, a statistical framework is developed that provides measures to address predictability of a target given the available input data and, after training an machine learning model, how much of the deterministic relations have been missed by the model. Consequently, the presented framework allows to differentiate model errors into unpredictable parts regarding the given input and a systematic miss of deterministic relations. The work extends the definition of model success or failure as well as convergence of a training process. Moreover, it is demonstrated how such measures can enrich the procedure of model training and guide the combination of different models. The framework is showcased with time series data on different synthetic and real world datasets. The implementation of the used models and measures for quantifying the deterministic relations are provided via the git repository .... (the repository will be published and the link will be provided in case of acceptance, but for the review process it is provided as a supplementary zip-file)
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Fredrik_Daniel_Johansson1
Submission Number: 2739
Loading