Abstract:
Objective Wind speed and power data from wind farms often contain a large amount of abnormal data. Effective data cleaning can improve data quality and make results such as renewable energy power forecasting and power generation forecasting more accurate.
Method This study collected wind speed and power data from typical wind farms in Ningxia, plotted a scatter distribution diagram of wind speed versus power to investigate the data distribution characteristics and the reasons for abnormal data generation.We compared the effectiveness of the DBSCAN algorithm and the Variance Change Rate criterion in identifying and cleaning abnormal data. Furthermore, we combined these methods with the Quartile Method approach to construct "DBSCAN-Quartile Method" and "Variance Change Rate-Quartile Method" composite models. Additionally, we integrated the DBSCAN algorithm with the Variance Change Rate criterion to develop "DBSCAN-Variance Change Rate" and "Variance Change Rate-DBSCAN" composite models. The data cleaning using these composite models was studied and compared with that of the single models.
Result The results show that: (1) Abnormal data points in the wind speed-power scatter plot of wind farms can be roughly classified into three categories: bottom clustered data, middle clustered data and other dispersed data. (2) Single models using either the DBSCAN algorithm or the Variance Change Rate criterion effectively identify and clean most of the abnormal data, but some local dispersed points still remain undetected. (3) Compared with single models, composite models can further identify a small number of abnormal data with less obvious outlier characteristics. Among them, the "DBSCAN - Quartile Method" composite model has the highest correlation coefficient between wind speed and power after cleaning, and the lowest root mean square error relative to the modeling curve, showing the best cleaning effect.
Conclusion By comparing the data cleaning effects of single models and composite models on the operational abnormal data of wind farms, the study demonstrates the reliability of composite models in improving the data quality of wind farms. The research results provide important references for energy demand forecasting, wind power accommodation level assessment and wind farm operation management.