Applied Science and Convergence Technology 2022; 31(4): 93-98
Published online July 30, 2022
https://doi.org/10.5757/ASCT.2022.31.4.93
Copyright © The Korean Vacuum Society.
Chunhyun Paik^{a } , Yongjoo Chung^{b } , and Young Jin Kim^{c , ∗ }
^{a}Division of Industrial Convergence Systems Engineering, Dongeui University, Busan 47340, Republic of Korea
^{b}Department of e-Business, Busan University of Foreign Studies, Busan 46234, Republic of Korea
^{c}Department of Systems Management and Engineering, Pukyong National University, Busan 48513, Republic of Korea
Correspondence to:youngk@pknu.ac.kr
Many countries have strived to expand the adoption of renewable power generation in the transition to a low-carbon society. Wind power is recognized as one of the most promising and scalable renewable energy sources for power generation, but the amount of wind power generation is heavily dependent on wind speed. Therefore, techniques that enable the reliable estimation of wind speed have long been under focus. In this study, statistically appropriate probability distribution functions were explored using wind speed measurement data from wind farm sites in the Republic of Korea. In particular, the problem of overfitting was investigated in depth by evaluating the fitness of distributions using different parameters. The suitability of mixed distributions was examined statistically based on the information criteria until suitable distributions were established. The results indicated that monthly wind speed data are a good fit with distinct Weibull distributions; thus, planning for wind power generation in the ROK should consider temporal variations in wind speed as revealed by the distribution analyses.
Keywords: Wind farm, Wind speed, Weibull distribution, Distribution fitting, Goodness-of-Fit
Global warming is mainly caused by the excessive emission of greenhouse gases (GHGs) and is considered one of the greatest environmental threats worldwide. The Intergovernmental Panel on Climate Change was formed in 1992 to counter climate change, and the Kyoto Protocol, under the United Nations Framework Convention on Climate Change, was signed in 1997 and superseded by the Paris Agreement in 2016. Under the agreement, member countries of the Conference of the Parties (COP) are required to submit national GHG inventories that account for GHG emissions from different sectors. Based on these inventories, it is known that the power generation sector is responsible for a significant portion of GHG emissions, and many countries are striving for the large-scale adoption of renewable power generation. Although the intermittency of renewables poses great challenges in energy system planning, the share of power generation from renewable resources is increasing at a remarkable pace. In particular, the use of wind power has many environmental and societal benefits, and wind energy is regarded as the most mature form of renewable energy from a techno-economic perspective [1]. It has also been noted that the effect of wind generation on climate change may be low alongside various benefits for the environment, economy, and society [2]. Thus, wind power is considered the most promising renewable resource in terms of potential installed capacity [3].
The Korean government recently announced the 9th Master Plan for Long-Term Electricity Supply and Demand, and it is advocated that approximately 26 % of electricity should be from renewable generation by 2034, of which 91 % will be supplied by solar and wind power [4]. In particular, accounting for more than 30 % of renewable energy generation, wind power is expected to become a critical form of power generation. Accordingly, the estimation and prediction of wind turbine output have been the subject of increasing research to better account for the intermittency of wind and variable wind speeds. A wide variety of models and methods have been proposed to predict the power output from wind generation under a range of circumstances [5–7]. The amount of power that can be harvested from the wind largely depends on wind speed and the given specifications of the wind turbine, such as its size and blade length. Thus, it is imperative to accurately estimate wind speeds to project the power that can be generated. Wind speed can generally be considered a random variable following specific distributions and can be modeled with different probability distribution functions of which the Weibull distribution is most widely adopted [8–12]. In previous studies, 2-parameter, 3-parameter, and mixed Weibull distributions have been investigated for their level of fitness for model wind speeds with reference to different criteria [13–15]. It should be noted, however, that most previous studies [8–15] have tested the goodness-of-fit (GOF) of wind speed against a single preselected distribution and employed different criteria, resulting in a lack of consistency in interpretation and comparison. This study was designed to derive wind speed probability distributions based on measurement data from wind farms in the Republic of Korea (ROK) from a statistical point of view. Specifically, a formal statistical procedure for the GOF test was employed to identify the most appropriate distribution function by considering temporal and seasonal variations. Often, as a suitable distribution cannot be identified owing to distinct variability over a relatively short time period, a mixture of different distributions can be explored; however, this may lead to overfitting, particularly when comparing the GOF for distributions with different parameters. Therefore, in this study, the Akaike and Bayesian information criteria are proposed for effectively scrutinizing the GOF of mixed distributions.
Even though the effective estimation of wind speed distribution can only be achieved using data with a sufficiently high spatiotemporal resolution, available data related to the operation of wind turbines at wind farm sites is often limited because of the closed nature of the domestic electricity market. In the ROK, as the amount of electricity generation directly affects the purchase price of the governmentowned distribution company (the Korea Electric Power Corporation, KEPCO), it is difficult to secure such a dataset because of the business confidentiality policy of private electricity generation companies. Therefore, publicly available datasets form the main source of information about wind power generation on the ROK, complemented by some limited information obtained through restricted access granted by power generation companies. The daily average wind speeds from individual wind turbines at three wind farm sites, Hankyung (HK), Sungsan (SS), and Taebaek (TB), in Korea were obtained from 2018 to 2020, as shown in Table I. Note that one of the nine turbines at the HK site was excluded from the analysis because of data unavailability.
Table 1 . Daily wind speed data from three wind farms in the Republic of Korea.
Wind farm | Number of turbines | Capacity (MW) | Data collection period |
---|---|---|---|
Hankyung | 8 | 19.5 | 2018.1.1.–2020.12.31 |
Sungsan | 10 | 20 | 2018.1.1.–2020.12.31 |
Taeback | 8 | 16 | 2019.1.1.–2020.11.30 |
It is assumed that daily wind speed follows an identical distribution within a particular month and thus, its distribution is estimated monthly for each site. This assumption takes seasonal (i.e., monthly) variations in wind speed into account while acquiring a sufficiently large number of samples. For example, 93 samples were available for January at the HK and SS sites (i.e., 93 = 31 days/year × 3 years). Figure 1 depicts the variations in daily wind speeds at the different sites and Fig. 2 compares the monthly average daily wind speeds. Significant differences were observed in the monthly average wind speeds, which indicates that seasonal variations must be accounted for.
When estimating wind speed distributions, possible candidate distribution functions must first be explored. Generally, descriptive statistics based on empirical distribution functions can be effective in checking the normality of the dataset, as shown in Fig. 3(a). Higher-order moments, such as skewness and kurtosis, are also useful for identifying candidate distributions using the Cullen–Frey relationship depicted in Fig. 3(b) [16]. The optimal candidate group for the wind speed distribution in February at the SS site included a normal distribution with skewness and kurtosis of 0 and 3, respectively, a lognormal distribution, and a Weibull distribution. Further details on the construction and interpretation of these graphs are provided elsewhere [16].
Once several candidate distributions were identified, the parameter estimation of the corresponding distributions was performed using classical statistics, such as maximum likelihood and moment. The maximum likelihood method is the most popular method when the sample size is greater than 30, which was employed in this study unless otherwise specified. Statistical analyses were conducted to estimate the monthly wind speed distribution for each of the wind farm sites using MINITAB Release 19 and R 4.1.2.
The GOF for the parameter estimation of distribution functions is often tested using graphical and analytical approaches. One of the most popular graphical approaches is to compare the empirical and theoretical distributions, as demonstrated in Fig. 4. The most widely used graphs include probability densities, cumulative distributions, quantile–quantile (
Although providing intuitive insights that can help guide subsequent analysis, these graphical approaches have limited quantitative value. A more rigorous analytical approach to evaluating the GOF for parameter estimation often involves statistical hypothesis testing. In this case, the null hypothesis,
where
Considering that distributions with fewer parameters are preferred in practice, three different 2-parameter distributions, namely normal, lognormal, and 2-parameter Weibull, were explored first to test the GOF for the wind speed data. These candidate distributions were chosen based on a preliminary descriptive statistical analysis and the results of previous studies. Wind speed data from the HK site were used to compare two GOF tests with a significance level of 5 %.
The candidate 2-parameter distributions were tested for their GOF against the monthly wind speed data, and the results are summarized in Table II. The wind speed data were fitted to the different distribution month by month. While the K–S test revealed that the GOF to one or more of the three distributions for each month was significant, the A–D test provided more conservative results in that, except for March, June, and November, most of the monthly wind speed data were well-fitted only to the lognormal distribution. There were no statistically significant distributions fitted to the wind speed data for June and November, which may be attributed to the fact that the A–D test assigns more weight to the data at the tail to better represent the distributional behavior of wind speed data. For November at the HK site, the data were significantly fitted to each of the three distributions based on the K–S test but not the A–D test, and noticeable differences between the tails of the empirical and theoretical distributions were observed (Fig. 5).
Table 2 . Goodness-of-fit test for wind speed data from Hankyung using 2-parameter Weibull distributions.
Mon. | Normal | Lognormal | Weibull | |||
---|---|---|---|---|---|---|
A–D | K–S | A–D | K–S | A–D | K–S | |
Statistic** | Statistic** | Statistic** | Statistic** | Statistic** | Statistic** | |
Jan. | 1.467 | 0.095 | 0.247 | 0.065 | 1.559 | 0.100 |
(<0.005)* | (0.184) | (0.746) | (0.456) | (<0.010)* | (0.154) | |
Feb. | 0.566 | 0.071 | 0.387 | 0.046 | 0.320 | 0.057 |
(0.139) | (0.420) | (0.382) | (0.696) | (>0.500) | (0.578) | |
Mar. | 0.901 | 0.085 | 0.756 | 0.759 | 0.512 | 0.062 |
(0.021)* | (0.263) | (0.047)* | (<0.001)* | (0.205) | (0.492) | |
Apr. | 0.805 | 0.061 | 0.692 | 0.052 | 0.461 | 0.049 |
(0.036)* | (0.510) | (0.069) | (0.618) | (>0.500) | (0.650) | |
May. | 2.51 | 0.143 | 0.541 | 0.052 | 1.43 | 0.113 |
(<0.005)* | (0.022)* | (0.161) | (0.609) | (<0.010)* | (0.092) | |
Jun. | 3.675 | 0.181 | 0.807 | 0.092 | 2.315 | 0.148 |
(<0.005)* | (0.003)* | (0.035)* | (0.217) | (<0.010)* | (0.019)* | |
Jul. | 2,431 | 0.112 | 0.393 | 0.065 | 1.200 | 0.083 |
(<0.005)* | (0.096) | (0.369) | (0.457) | (<0.010)* | (0.279) | |
Aug. | 1.796 | 0.093 | 0.620 | 0.062 | 0.994 | 0.070 |
(<0.005)* | (0.198) | (0.104) | (0.491) | (0.012)* | (0.406) | |
Sep. | 3.537 | 0.140 | 0.210 | 0.042 | 1.906 | 0.097 |
(<0.005)* | (0.029)* | (0.857) | (0.725) | (<0.010)* | (0.186) | |
Oct. | 0.570 | 0.071 | 0.056 | 0.046 | 0.377 | 0.059 |
(0.136) | (0.393) | (0.084) | (0.676) | (>0.250) | (0.519) | |
Nov. | 1.569 | 0.118 | 0.872 | 0.082 | 1.148 | 0.105 |
(<0.005)* | (0.080) | (0.024)* | (0.297) | (<0.010)* | (0.136) | |
Dec. | 1.052 | 0.128 | 0.433 | 0.059 | 0.799 | 0.113 |
(0.009)* | (0.049)* | (0.297) | (0.532) | (0.038)* | (0.096) |
* Rejection of the null hypothesis at the significance level 5 %. A–D, Anderson –Darling test; K–S, Kolmogorov–Smirnov test.
**
The A–D test may be considered more rigorous than the K–S test to fit the wind speed data, and the wind speed data from the other two sites were also tested for their fitness for the three 2-parameter distributions using the A–D test, as summarized in Table III. No single distribution was found to fit the wind speed data from these sites and, as such, different distributions must be employed to model the wind speeds at different times of the year. In other words, spatiotemporal variations in wind speed may not be properly represented by a specific single distribution, which contradicts the assumptions made in previous studies that wind speed follows a 2-parameter Weibull distribution. It is well known that the GOF can generally be improved by employing distribution functions with more parameters within the same distribution family. Therefore, 3-parameter Weibull distributions were subsequently investigated.
Table 3 . A–D test of wind speed data from Sungsan and Taebaek using 2-parameter Weibull distributions.
Mon. | Sungsan site | Taebaek site | ||||
---|---|---|---|---|---|---|
Normal | Lognormal | Weibull | Normal | Lognormal | Weibull | |
Statistic** | Statistic** | Statistic** | Statistic** | Statistic** | Statistic** | |
Jan. | 0.231 | 0.652 | 0.284 | 0.924 | 1.488 | 0.906 |
(0.799) | (0.086) | (>0.250) | (0.018)* | (<0.005)* | (0.020)* | |
Feb. | 0.570 | 0.554 | 0.453 | 0.309 | 0.405 | 0.232 |
(0.135) | (0.148) | (>0.250) | (0.547) | (0.342) | (>0.250) | |
Mar. | 1.078 | 0.491 | 0.749 | 1.284 | 0.897 | 0.984 |
(0.008)* | (0.214) | (0.048)* | (<0.005)* | (0.021)* | (0.013)* | |
Apr. | 0.586 | 0.638 | 0.344 | 1.350 | 0.363 | 0.99 |
(0.123) | (0.093) | (>0.250) | (<0.005)* | (0.430) | (0.012)* | |
May. | 1.620 | 0.792 | 1.071 | 2.066 | 0.839 | 1.479 |
(<0.005)* | (0.039)* | (<0.003)* | (<0.005)* | (0.029)* | (<0.010)* | |
Jun. | 4.114 | 1.12 | 3.096 | 2.890 | 1.227 | 2.406 |
(<0.005)* | (0.006)* | (<0.010)* | (<0.005)* | (<0.005)* | (<0.010)* | |
Jul. | 1.286 | 0.671 | 0.820 | 1.508 | 0.344 | 0.867 |
(<0.005)* | (0.077) | (0.033)* | (<0.005)* | (0.477) | (0.024)* | |
Aug. | 1.877 | 0.226 | 1.179 | 0.912 | 0.629 | 0.679 |
(<0.005)* | (0.814) | (<0.010)* | (0.019)* | (0.097) | (0.075) | |
Sep. | 2.638 | 0.28 | 1.785 | 3.958 | 1.024 | 2.723 |
(<0.005)* | (0.635) | (<0.010)* | (<0.005)* | (0.010)* | (<0.010)* | |
Oct. | 1.172 | 0.731 | 1.178 | 1.440 | 0.381 | 1.129 |
(<0.005)* | (0.055) | (<0.010)* | (<0.005)* | (0.392) | (<0.010)* | |
Nov. | 1.237 | 0.660 | 0.913 | 1.584 | 0.342 | 1.192 |
(<0.005)* | (0.082) | (0.020)* | (<0.005)* | (0.480) | (<0.010)* | |
Dec. | 1.711 | 0.375 | 1.301 | 1.146 | 0.834 | 1.492 |
(<0.005)* | (0.407) | (<0.010)* | (<0.005)* | (0.028)* | (<0.010)* |
* Rejection of the null hypothesis at the significance level 5 %.
**
In addition to the scale and shape parameters of the 2-parameter Weibull distribution, the threshold parameter is added to the 3-parameter Weibull distributions, of which the density
where β, λ, and τ denote the shape, scale, and the threshold parameter, respectively. Note that β > 0, λ > 0, and
Table 4 . A–D test results of wind speed data for 3-parameter Weibull distributions.
Mon. | Wind Farm Site | |||||
---|---|---|---|---|---|---|
Hankyung | Sungsan | Taebaek | ||||
Statistic | Statistic | Statistic | ||||
Jan. | 0.250 | 0.000* | 0.187 | 0.177 | 1.528 | 0.057 |
(>0.500) | (>0.500) | (<0.005)* | ||||
Feb. | 0.143 | 0.093 | 0.311 | 0.134 | 0.169 | 0.110 |
(>0.500) | (>0.500) | (>0.500) | ||||
Mar. | 0.419 | 0.019* | 0.374 | 0.013* | 0.664 | 0.001* |
(0.353) | (0.438) | (0.089) | ||||
Apr. | 0.337 | 0.034* | 0.359 | 0.012* | 0.178 | 0.000* |
(>0.500) | (0.466) | (>0.500) | ||||
May. | 0.423 | 0.000* | 0.576 | 0.002* | 0.605 | 0.000* |
(0.345) | (0.141) | (0.121) | ||||
Jun. | 0.757 | 0.000* | 0.757 | 0.000* | 0.598 | 0.000* |
(0.050) | (0.050) | (0.126) | ||||
Jul. | 0.242 | 0.000* | 0.471 | 0.000* | 0.369 | 0.010* |
(>0.500) | (0.252) | (0.447) | ||||
Aug. | 0.469 | 0.000* | 0.236 | 0.000* | 0.741 | 0.001* |
(0.256) | (>0.500) | (0.057) | ||||
Sep. | 0.450 | 0.000* | 0.572 | 0.000* | 0.514 | 0.000* |
(0.292) | (0.144) | (0.203) | ||||
Oct. | 0.247 | 0.062 | 0.921 | 0.149 | 0.222 | 0.000* |
(>0.500) | (0.015)* | (>0.500) | ||||
Nov. | 0.695 | 0.007* | 0.462 | 0.002* | 0.213 | 0.000* |
(0.076) | (0.269) | (>0.500) | ||||
Dec. | 0.373 | 0.008* | 0.232 | 0.000* | 0.464 | 0.000* |
(0.440) | (>0.500) | (0.265) |
* Rejection of the null hypothesis at the significance level 5 %.
For example, the wind speed data for January at the HK site were not fitted for the 2-parameter Weibull distribution, but were successfully fitted using the 3-parameter distribution at a significance level of 5 % (Table II and Table IV). On the other hand, the wind speed data for February at the HK site were well-fitted using both the 2- and 3-parameter Weibull distributions. It is intuitive that whenever the 2-parameter Weibull distribution properly represents the data, the 3-parameter distribution will provide an improved fit because the GOF of the parameter estimation can always be improved as more parameters are included within the same distribution family.
When both the 2- and 3-parameter Weibull distributions fit the data properly, it is important to determine the specific test for modeling wind speeds. Therefore, simply comparing the
where
It is worth noting that the wind speed data for October at the SS site, and January at the TB site, could not be properly fitted with these distributions; further analysis confirmed that no suitable 2- or 3-parameter distributions could be identified. Indeed, a mixture of distributions may need to be explored when models based on a single distribution provide poor characterization [20]. Here, a mixture of 2-parameter Weibull distributions was subsequently investigated, defined by the weighted sum of two or more 2-parameter Weibull distributions, as follows:
where
where
Table 5 . Comparison of the Akaike information criterion (AIC) and Bayesian information criterion (BIC).
Data | 3-parameter Weibull | Mixed 2-parameter Weibull | ||
---|---|---|---|---|
AIC | BIC | AIC | BIC | |
Oct. at Sungsan | 452.54 | 460.14 | 438.32 | 450.98 |
Jan. at Taebaek | 316.64 | 323.02 | 307.37 | 318.00 |
The AIC and BIC values for the mixed Weibull distribution were lower than those for the 3-parameter Weibull distribution, indicating an improvement in the GOF. This indicates that a mixture of the 2- parameter Weibull distributions provides a better data fit than the 3- parameter Weibull distribution. Such an improvement in GOF does not necessarily mean that a mixed Weibull distribution provides the best possible fit to the data, but indicates that it can be considered a strong candidate distribution. Following these procedures, the most appropriate distributions for fitting the wind speed data are summarized in Table VI. Notably, the wind speed data for October at the SS site, and January at the TB site, were better represented by a mixture of two 2-parameter Weibull distributions (Fig. 6).
Table 6 . Appropriate distribution functions for fitting monthly wind speed at three wind farm sites in the Republic of Korea.
Mon. | Wind Farm Site | ||
---|---|---|---|
Hankyung | Sungsan | Taebaek | |
Jan. | 3-parameter (1.833, 5.042, 3.869) | 2-parameter (3.777, 9.338, NA) | Mixed with ω_{1}= 0.210 (9.621, 3.619, 3.642, 9.505) |
Feb. | 2-parameter (2.837, 8.491, NA) | 2-parameter (3.252, 9.012, NA) | 2-parameter (2.994, 7.491, NA) |
Mar. | 3-parameter (1.756, 5.737, 1.722) | 3-parameter (1.912, 6.288, 2.282) | 3-parameter (1.399, 4,502, 2,832) |
Apr. | 2-parameter (2.413, 6.929, NA) | 3-parameter (1.745, 5.654, 2.246) | 3-parameter (1.288, 3.723, 2.946) |
May. | 3-parameter (1.361, 3.799, 1.761) | 3-parameter (1.639, 5.080, 2.052) | 3-parameter (1.438, 4.479, 2.379) |
Jun. | 3-parameter (1.412, 3.116, 1.371) | 3-parameter (1.308, 3.160, 2.289) | 3-parameter (1.100, 2.311, 2.677) |
Jul. | 3-parameter (1.341, 3.819, 1.527) | 3-parameter (1.354, 3.986, 2.425) | 3-parameter (1.615, 4.427, 1.368) |
Aug. | 3-parameter (1.282, 4.209, 1.936) | 3-parameter (1.471, 5.171, 2.204) | 3-parameter (1.292, 3.595, 2.714) |
Sep. | 3-parameter (1.306, 4.341, 1.759) | 3-parameter (1.401, 4.870, 2.297) | 3-parameter (1.056, 3.122, 2.398) |
Oct. | 2-parameter (2.720, 7.578, NA) | Mixed with ω_{1}= 0.940 (3.722, 18.908, 7.652, 14.462) | 3-parameter (1.258, 3.051, 2.780) |
Nov. | 3-parameter (1.770, 5.101, 2.056) | 3-parameter (1.181, 5.203, 2.387) | 3-parameter (1.477, 4.377, 2.799) |
Dec. | 3-parameter (1.932, 6.071, 2.480) | 3-parameter (1.519, 4.913, 3.332) | 3-parameter (1.547, 1.372, 4.397) |
* For the two- and three-parameter distributions, the values in parentheses correspond to (β,λ, and τ).
** For the mixed distributions, the values in the parenthesis correspond to (β_{1},β_{2},λ_{1},λ_{2}).
*** NA: Not Applicable.
The accurate estimation of wind speed distributions is essential for the effective prediction and management of wind power generation, which plays a central role in expanding renewable power generation and mitigating GHG emissions. That there will be spatiotemporal variations in wind speed is intuitive, and therefore, wind speed distribution will vary greatly between regions and over time. This study explored a formal statistical procedure for deriving appropriate distribution functions that fit monthly wind speed data obtained from three different wind farms in the ROK. The candidate distributions were first identified using descriptive statistical approaches, and then tested for GOF following formal statistical tests. The A–D test may be considered more rigorous than the standard K–S test because it better captures the distributional behavior at the tail of the data distribution. Furthermore, 2- and 3-parameter Weibull distributions were compared for their GOF to the wind speed data, and the results indicated that the wind speed data are better fitted by including additional parameters. When a suitable distribution cannot be obtained, the application of mixed Weibull distributions can be considered, and the problem of overfitting can be addressed by testing the obtained GOF with information criteria, such as AIC and BIC.
One of the major shortcomings of this study is its limited ability to account for possible overfitting and underfitting, which may be overcome with more extensive data collection. In addition, filtering outliers via data preprocessing may further improve the understanding of wind-speed distributions, which warrants further investigation. Overall, the results of this study are expected to be extended and linked to the derivation and prediction of power output from wind generation in the ROK and elsewhere.
This work was supported by a research grant from Pukyong National University (2021).
The authors declare no conflicts of interest.