风电场风功率异常数据清洗与质控技术对比分析

袁瑞瑞; 张逸飞; 刘建宏; 雍佳; 潘娅英

doi:10.16516/j.ceec.2025-056

风电场风功率异常数据清洗与质控技术对比分析

Comparative Study on Abnormal Data Cleaning and Quality Control Technologies for Wind Power in Wind Farms

摘要

摘要:
目的风电场风速功率数据中通常包含大量异常数据，有效的数据清洗可以提高数据质量，使新能源功率预测、发电量预测等结果更加精确。
方法收集宁夏典型风电场风速及功率数据，绘制风速-功率散点分布图，探讨数据分布特征和异常数据产生原因；对比DBSCAN算法和方差变化率判据对异常数据的识别清洗效果，并分别与四分位法结合，构建“DBSCAN-四分位”“方差变化率-四分位”组合模型，同时将 DBSCAN 算法与方差变化率判据相互组合，构建“DBSCAN-方差变化率”“方差变化率-DBSCAN”组合模型，开展组合模型的清洗质控研究，并与单一模型对比。
结果研究表明：（1）风电场风速-功率散点图中异常数据点可大致分为底部堆积型数据、中部堆积型数据和其他离散型数据3类；（2）DBSCAN算法或方差变化率判据的单一模型能够有效识别清洗大部分异常数据，但仍有局部区域离散点未能被识别；（3）组合模型相比单一模型，能够进一步识别少部分离群特征不明显的异常数据，其中，DBSCAN-四分位的组合模型，清洗后风速与功率相关系数最高，相对于建模曲线的均方根误差最低，清洗效果最佳。
结论通过对比单一模型、组合模型对风电场运行异常数据的清洗效果，证明了组合模型对于提高风电场数据质量的可靠性，研究结果为能源需求预测、风电消纳水平评估、风电场运行管理等提供重要参考。

Abstract:
Objective Wind speed and power data from wind farms often contain a large amount of abnormal data. Effective data cleaning can improve data quality and make results such as renewable energy power forecasting and power generation forecasting more accurate.
Method This study collected wind speed and power data from typical wind farms in Ningxia, plotted a scatter distribution diagram of wind speed versus power to investigate the data distribution characteristics and the reasons for abnormal data generation.We compared the effectiveness of the DBSCAN algorithm and the variance change rate criterion in identifying and cleaning abnormal data. Furthermore, we combined these methods with the quartile method approach to construct "DBSCAN-quartile method" and "variance change rate-quartile method" composite models. Additionally, we integrated the DBSCAN algorithm with the variance change rate criterion to develop "DBSCAN-variance change rate" and "variance change rate-DBSCAN" composite models. The data cleaning using these composite models was studied and compared with that of the single models.
Result The study shows that: (1) Abnormal data points in the wind speed-power scatter plot of wind farms can be roughly classified into three categories: bottom clustered data, middle clustered data and other dispersed data. (2) Single models using either the DBSCAN algorithm or the variance change rate criterion effectively identify and clean most of the abnormal data, but some local dispersed points still remain undetected. (3) Compared with single models, composite models can further identify a small number of abnormal data with less obvious outlier characteristics. Among them, the "DBSCAN-quartile method" composite model has the highest correlation coefficient between wind speed and power after cleaning, and the lowest root mean square error relative to the modeling curve, showing the best cleaning effect.
Conclusion By comparing the data cleaning effects of single models and composite models on the operational abnormal data of wind farms, the study demonstrates the reliability of composite models in improving the data quality of wind farms. The research results provide important references for energy demand forecasting, wind power accommodation level assessment and wind farm operation management.

HTML全文

参考文献(28)

施引文献

资源附件(0)