Abstract: This research work presents a hybrid modeling approach for improving the prediction of energy output in off-grid solar photovoltaic systems. The proposed method combines unsupervised clustering using K-Means with local regression models, specifically linear regression, K-nearest neighbors (KNN), and decision trees. This is done to address the intrinsic variability of solar generation data. This approach is demonstrated using a real-world case study of a self-sufficient household in Galicia, Spain, which provides a comprehensive one-year dataset with 10-min resolution. After preprocessing and exploratory data analysis, the data are divided into clusters to improve model specificity. For each cluster, hyperparameter-optimized regression models are trained and evaluated using cross-validation and statistical tests such as Kruskal–Wallis and Dunn’s post hoc analysis. Results demonstrate that a segmentation into three clusters achieves a favorable balance between local specialization and overall robustness. Linear regression without intercept outperforms in terms of \(R^2\) across clusters, while KNN reduces Mean Absolute Error (MAE) in specific segments. This hybrid strategy shows potential for robust energy modeling under variable environmental conditions and limited data availability. Future work includes integrating temporal-aware models and extending the approach to diverse geographic and photovoltaic configurations.
External IDs:dblp:conf/hais/DiazLabradorDPFC25
Loading