Wind speed variability study based on the Hurst

Full text

Turn on search term navigation

INTRODUCTION

In 2017, Mexico increased the wind power total capacity to 3527 MW, representing about 4.7% of total generation capacity. Thanks to the abundant wind resource, the region of the Isthmus of Tehuantepec in the state of Oaxaca, Mexico, concentrates 76.8% of the total installed wind power. The wind in the southern region of the Isthmus of Tehuantepec, where the town of La Venta is located, has well‐defined seasonal behavior. The fall and winter seasons show the strongest and most continuous wind speeds of the year; from October to January, these winds can reach sustained speeds of over 20 m/s. The weakest winds occur during April, May, and June, with a peak in July and a decrease again in August and September. The interannual variability is also present, that is, there are some years with high wind speeds and years with low wind speeds. The average wind speed varies monthly and yearly from one to the next; with the same tendency being observed from 1 week to the next and from 1 day to the next.

The analysis of wind speed time series has recently focused on the generation of new models of prediction using statistical or artificial intelligence techniques; their results are based on the amount of historical data available. Examples of these techniques are the autoregressive models of Box‐Jenkins, as well as those of artificial neural networks, genetic algorithms, and others. However, due to the high variability of the wind speed, it is reasonable to question how consistent a particular time series is, before attempting to model the data and trying to predict its behavior.

Harold Edwin Hurst developed a useful tool for this purpose, the statistical measurement known as the Hurst exponent or Hurst coefficient, H. It is used to characterize the dependency of a series of data over a long period. Hurst used it particularly in the field of geophysics, where he had a generous amount of hydrologic data and was able to compare it with traditional methods that handled stochastic variables. The Hurst exponent can classify different time series and distinguish a random one from one that is not, as well as determine if a series has or not a Gaussian distribution. Hurst discovered that the most natural systems do not follow a random trajectory and thus H represents a numeric estimate that can determine the degree of predictability of a time series.

In the 1960s, the mathematician Benoit Mandelbrot used the fractal term as an adjective to describe physical systems whose geometric complexity could be defined more precisely with a parameter known as fractal dimension, than with the traditional Euclidean dimension. The fractal dimension is a number that reflects the change in the topological measurement of a physical set at different scales and is directly related to the Hurst coefficient. The main attraction of fractal geometry is its ability to describe irregular forms and complex objects which cannot be defined by traditional Euclidian geometry. A tangible example can be the dimension of the time series, an objective of this study.

Chang et al used the concept of fractal dimension to study the characteristics of wind speed in three different sites in Taiwan; using only the box counting technique, analyzing monthly and yearly window width from a time series of 3 years. Their results indicate a significant dependency between fractal dimension and weather conditions. The value of the fractal dimension on an annual scale for the three sites ranges between 1.61 and 1.66 establishing a tendency for the series toward randomness. The authors state that the fractal dimension value increases about 8% if the number of data used in the analysis is doubled.

Harrouni et al analyzed a time series of solar radiation to study persistence through the calculation of its fractal dimension. Daily and annual measurements have been collected for 14 years using the rectangular covering method. Their results indicate that the solar time series on a daily and annual scale are antipersistent. Rehman et al studied the correlation between various meteorological variables at different stations in Saudi Arabia. They used the wavelet analysis applied to time series for the variables: pressure, temperature, rainfall, relative humidity, and wind speed using daily averages for a time series of a period of 16 years. The result of the analysis shows either a strong correlation or anticorrelation depending on the timescale used. With regard to wind speed, the series indicate a strong correlation on a weekly scale.

Liaw et al presented a different technique for the calculation of fractal dimension in a time series. This technique defines the errors of interpolation in some segments of the series and concludes that such errors are proportional to the exponent law. In the conducted exercise, the procedure proved to be accurate and straightforward for the studied time series. Piacquadio and de la Barra applied multifractal analysis to study the wind velocity. They concluded that the wind spectrum is of Weierstrass type and compared the different Weierstrass spectra with the wind spectrum obtaining information on the Weierstrass parametric values that agree with the wind spectrum. Mahbub‐Alam et al studied the wind variability in both in time and spatial domain using wavelet transform and fast Fourier transform power spectrum techniques.

Telesca et al found that the dynamics of the wind speed time series have two characteristics at different timescales: persistence and multifractality at long timescales, and antipersistence and monofractality at short timescales. Wang et al proposed that the fragmentation of the time series must be higher than 1 day when applying the multifractal detrended fluctuation analysis (MDFA) (scale); this lets to avoid the differences and the sudden drops in the scale coefficients calculation.

The number of data necessary to perform an adequate calculation of the fractal dimension depends on the nature of the time series, as well as the proper identification of the cycles that occur in the studied series. Therefore, in this study, it was necessary to identify the cycles that occur throughout 1 year. This was possible thanks to that the time series was built of almost 7 years of hourly measurements. Furthermore, to have an additional support element that contributes to identifying the cyclic behavior of the time series, the random walk of the time series was obtained, which allows visualizing all the possible trajectories of the walk generated over the almost 7 years.

From the literature review, it can be seen that some studies analyze wind speed time series using the fractal dimension as a relevant parameter. In this study, an hourly time series with measurements taken during a period a little over 6 years was examined. Daily, monthly, seasonal, and annual window widths of the hourly time series were analyzed with the aid of four different techniques. In the same way, the MDFA technique was used to investigate if the time series were mono‐ or multifractal. So, the most appropriate management of time series was proposed.

In the same way, a simulation of a typical month was carried out using a self‐affine trace generator to demonstrate that by involving the Hurst coefficient in the generation of wind time series the typical variability of the time series can be calculated, which is not achieved by simulating the time series only using random numbers. All these analyses contribute to increasing the knowledge of the wind speed time series structure in the Isthmus of Tehuantepec region in Mexico.

The dataset from the wind measurements shows a bimodal behavior. This phenomenon was analyzed to know the way in which it would affect the modeling process. Therefore, in addition to calculating the Hurst coefficient, the MDFA technique was applied to determine if the time series requires more than one Hurst coefficient.

The wind measurements data with bimodal characteristics has shown a multifractal structure. Discordance tests were applied to model these time series to eliminate the deviated values and to determine the number of populations with normal characteristics that a sample has. Once the populations have been identified, it is possible to make a model for each population.

This analysis is complemented with the calculation of the statistical measurements (mean, standard deviation, and coefficient of variation) of the analyzed time series and their comparison against each other. The statistical and fractal similarities of the different scales in the time series (invariance of the scaling) were likewise compared.

TECHNIQUES USED IN THE ANALYSIS

Five techniques were used to obtain the fractal dimension and the Hurst coefficient, H. These techniques were: the box counting method, the R/S rescaled range analysis, the power spectrum, detrended fluctuation analysis (DFA), and multifractal detrended fluctuation analysis (MDFA). The latter was used to determine if the time series has multifractals properties.

The box counting method is widely used because of its simplicity and easy computer implementation. This method can be applied to structures with and without self‐similarity. It is also used when the structure of the series is very irregular. The studied objects can be considered immersed in a higher dimension space, for example, if the analysis is in the three‐dimensional space, they are no longer flat boxes, but they have height, width, and depth.

The (R/S) rescaled range technique, proposed by Mandelbrot and Wallis, is the best known to determine de value of H. This method lets us calculate the self‐similarity H parameter, which measures the intensity and the long‐term dependence of a time series.

The technique used was power spectrum, which has its origin in spectral analysis, and is the graphic representation of half the square of the spectral density as a function of its frequency. It is mainly used in the signal analysis to know the details of the frequencies that make them up.

The DFA technique, proposed by Peng et al, has been used to detect correlations in time series without stationarity. This technique has been successfully used in various fields of knowledge, such as heart rate dynamics, economic time series, etc.

The MDFA technique is a variation of the DFA technique. Commonly, it is used to determine if the time series is multifractal, meaning that the series must be represented with more than a single Hurst coefficient.

In this study, box counting, rescaled range, and power spectrum analyses were carried out using the Benoit software, which has been successfully used in several investigations. The DFA and MDFA techniques were analyzed using MATLAB software.

Box counting method

This method is one of the most used for estimating the fractal dimension of a picture. The graphics of the time series were exposed at different scales allowing that the software did make the necessary calculations.

The method consists of the placement of the structure within a grid of side (r) and simply counting the number of flat boxes (squares) that contain part of the structure as can be seen in Figure , the obtained value, N, is a function of r: N (r). If a grid, of side (r) increasingly smaller, is used, the structure can be represented by a double‐logarithmic diagram log N (r) as a function of log (1/r). The slope resulting from joining the obtained points with a line will be the dimension of the boxes counted D_C.

Box counting method

R/S statistic

The following expression can define a set of observations: [Image Omitted. See PDF]

where X_k defines a time series, with a mean $\bar{X} (n)$ and a variance S(n)². The R/S statistic is given by:

[Image Omitted. See PDF]

where:[Image Omitted. See PDF]

The factor S(n), which represents the square root of the variance (standard deviation), is introduced for normalization purposes. The expression defines a random walk; therefore, R(n)/S(n) essentially characterizes the normalized range of the process, W_k. It could be expected that the square of this scaled measurement generated with n now as n^2H would be like the scale between the variance of a random walk and n, thus producing the following expression:[Image Omitted. See PDF]where E is a constant of proportionality.

The accuracy in the determination of H, at all‐time intervals, depends on the number of pieces of information (data) used in the calculation. If this number is reasonably large (this happens when the longest time interval is recorded several times), the R/S curve is expected to provide information on self‐similarity at all‐time intervals; but if the recorded time has correlation only in the short term, then the log‐log graph will be a straight line with a slope of 0.5.

Power spectrum

The power spectrum of a periodic function f(t) is represented by the following expression:

[Image Omitted. See PDF]

where $P (ω)$ is the power spectrum, T is the function period, ω is the frequency, and t is the time.

Scheianu et al decide to present the “power spectrum” using the sum of Fourier's transforms for each of the functions, thus obtaining the following representation:

[Image Omitted. See PDF]

where c and a are constants.

Zwiggelaar et al indicate that the Fourier's transform and the fractal dimension of an image are related through the power spectrum. They also argue that the resulting power spectrum of a fractal image is proportional to a frequency, ω, with exponent d_f, $ω^{d_{f}}$ , whose value is linearly related to the fractal dimension.

Detrended fluctuation analysis

The DFA method is a modified root mean square (RMS) analysis of a random walk, starting with a signal, u(i), where i = 1,…, N; and N is the length of the signal. The first step of the DFA method is to integrate u(i) to obtain:[Image Omitted. See PDF]

where:[Image Omitted. See PDF]

The integrated profile y(i) was then divided into boxes of equal length, n. In each box, y(i) was fitted using a polynomial function y(i), which represents the local trend in that box. Different orders of DFA‐l were used, when a different order of a polynomial is fitted.

Next, the integrated profile y(i) was detrended by subtraction in the local trend y_n(i) in each box of length n:[Image Omitted. See PDF]

Finally, for each box n, the RMS fluctuation for the integrated and detrended signal was calculated:[Image Omitted. See PDF]

The above calculation is then repeated for different box length n to obtain the behavior of F(n) over a broad range of scales. For scale‐invariant signals with power‐law correlations, there is a power‐law relationship between the RMS fluctuations function F(n) and the scale n:[Image Omitted. See PDF]

Because power laws are scale invariant, F(n) is also called the scaling function and the parameter α is the scaling exponent. The value of α represents the degree of the correlation of the signal: if α = 0.5, the signal is uncorrelated (white noise); if α < 0.5, the signal is correlated; and if α > 0.5, the signal is anticorrelated.

Multifractal detrended fluctuation analysis

As mentioned earlier, MDFA technique is a variation of DFA. The following procedure was followed to analyze a time series using this technique:

The random walk was generated according to the equation : [Image Omitted. See PDF]  where Y(i) is the random walk, x_k is the components of the time series, and $\bar{x}$ is the average wind speed.
The profile obtained from the random walk was divided into equal segments (s). [Image Omitted. See PDF]
The local trend for each segment was calculated (2N_s) by using an adjustment of least squares. After that, the coefficient of variation was determined. [Image Omitted. See PDF]  For each segment, v, where v = 1, …, N_s; and, [Image Omitted. See PDF]  For each segment, v, where v = N_s + 1, …, 2N_s; and y_v(i) is the polynomial fit for the follow‐up (v).
An average of all the segments was generated to obtain the order of the fluctuation function (qth), where the variable can take any value, except zero. [Image Omitted. See PDF]
Determining the scaling behavior of the fluctuation functions by analyzing log‐log plots F_q(s) vs s for each value of q.

If the behavior of the fluctuation functions is steady and the slopes are equal (same H value), then the time series is monofractal, but if the fluctuation functions are different, then the time series is multifractal.

WIND SPEED DATABASE PROCESSING

The dataset used in this study was an hourly wind speed time series collected by the Federal Electricity Commission of Mexico in the wind power plant La Venta II. Measurements were taken at 30 m high. Sample rate and averaged period were defined as 1 Hz and 1 hour, respectively. The measurement period comprised from February 1, 1994 to April 27, 2000. So, the entire time series was made up of 54 610 hourly measurement data. Figure (A) shows the complete hourly time series. From this figure, a clear trend, cycles, seasonalities, or typical patterns are not visible at first glance. Figure (B)‐(H) show the corresponding hourly time series for the years 1994‐2000, respectively. In the same way, the monthly and seasonal window widths of the time series were also generated.

View Image - Hourly wind speed time series from the wind power plant La Venta II in Oaxaca, Mexico. (A) Hourly wind speed time series (from February 1, 1994 to April 27, 2000). (B) Hourly wind speed time series (from February 1, 1994 to December 31, 1994). (C) Hourly wind speed time series (from January 1, 1995 to December 31, 1995). (D) Hourly wind speed time series (from January 1, 1996 to December 31, 1996). (E) Hourly wind speed time series (from January 1, 1997 to December 31, 1997). (F) Hourly wind speed time series (from January 1, 1998 to December 31, 1998). (G) Hourly wind speed time series (from January 1, 1999 to December 31, 1999). (H) Hourly wind speed time series (from January 1, 2000 to April 27, 2000)

Hourly wind speed time series from the wind power plant La Venta II in Oaxaca, Mexico. (A) Hourly wind speed time series (from February 1, 1994 to April 27, 2000). (B) Hourly wind speed time series (from February 1, 1994 to December 31, 1994). (C) Hourly wind speed time series (from January 1, 1995 to December 31, 1995). (D) Hourly wind speed time series (from January 1, 1996 to December 31, 1996). (E) Hourly wind speed time series (from January 1, 1997 to December 31, 1997). (F) Hourly wind speed time series (from January 1, 1998 to December 31, 1998). (G) Hourly wind speed time series (from January 1, 1999 to December 31, 1999). (H) Hourly wind speed time series (from January 1, 2000 to April 27, 2000)

Federal Electricity Commission of Mexico (CFE) has four measurement masts at the wind power plant La Venta II. Each mast has a three‐cup anemometer class A, a wind vane, and a Campbell Scientific Datalogger. Characteristics of wind velocity sensors used to build the dataset are shown in Table .

Wind speed and wind direction sensors specifications

Specifications	Anemometer	Wind Vane
Measurement range	0‐45 m/s	0‐360°
Accuracy	±1.5%	±1.5%
Resolution	<0.1 m/s	<1°

Random walk

A random walk of the wind speed time series was generated according to the equation to know all the trajectories that follow the wind speed. Figure shows the random walk of the time series of La Venta, Oaxaca. The reduction of data was due to the application of the square root transformation to the dataset to stabilizing the mean and variance.

Random walk of the wind speed time series of La Venta, Oaxaca

Fractal dimension (D)

The concept of dimension, used by Benoit Mandelbrot, is a simplification of the one used by Felix Hausdorff and corresponds to the definition of the capacity of a geometric figure, which was established in 1958 by the Russian mathematician Kolmogorov.

The dimension of a set E is defined by the following expression:[Image Omitted. See PDF]where E is a bounded subset of a p‐dimensional Euclidean space, N(r) is the minimum number of p‐dimensional cubes of the r side, needed to cover E. For a point, N(r) = 1; for a line, N(r) = 1/r; for a surface, N(r) = 1/r². It is easy to verify, by applying equation , that d_E = 0, 1 and 2 for a point, a line, and a plane, respectively.

A two‐dimensional object, such as a square on a plane, can also be divided into N self‐similar parts, each of which is in relation $r = 1 / \sqrt{N}$ to the total. Like a three‐dimensional object, like a cube, can be divided into N smaller cubes, each of which will be in relation $r = 1 / (N^{1 / 3})$ to the total. In the same way, a D‐dimensional, self‐similar object can be divided into N smaller copies that are related $r = 1 / (N^{1 / D})$ to the original object, from where N = 1/r^D the value of the fractal dimension, D, corresponding to an exact similarity can be calculated as:[Image Omitted. See PDF]

where D does not have to be an integer and the logarithms can be taken at any base.

Finally, the fractal dimension provides an idea of the real capacity of an object to fill the space in which it is embedded.

Fractal dimension, D, and Hurst coefficient, H

Fractal dimension of a geometric plane is 2. The fractal dimension of a random trajectory would be halfway between the path of a line and a plane, that is, it would be 1.5.

Hurst coefficient, H, can be converted into the fractal dimension, D, using the following expression:

[Image Omitted. See PDF]

Thus, for H = 0.5, a value for D = 1.5 is obtained, both values are consistent with a separate random system. A value of 1.0 < H < 0.5 will turn out to be a fractal dimension closer to a line. In summary, a persistent time series in Hurst's terms would result in a softer line with fewer peaks than a random path. In the same way, an antipersistent time series 0 < H < 0.5 would produce a greater fractal dimension, a more rugged series than a random trajectory and thus a system subject to more setbacks due to its antipersistence.

The following is a classification of times series according to Hurst coefficient, H:

0 < H < 1/2: in this case, antipersistent time series can be seen and, this is known as mean reversion, that is, if the time series has been above a certain value that has served as a long‐term mean in the previous period, it is more probable to go down in the next period and vice versa. A negative correlation among the events of the time series is seen and the time series is considered to have “pink noise” which is a common process in nature and is related to relaxation (dynamic equilibrium) and turbulence. The antipersistence is a stochastic process that occurs in time series that have a negative correlation between its increments. Events tend to come back to the original place, that is, if a data of the series has been above a specific value (average value of the previous period), it is more probably to be down in the following period and vice versa.
H = 1/2: in this case, the data are independent and have no memory (there is no correlation between the pieces of information). It is a random number which complies with all the features of standard Brownian motion. This is known as “white noise.”
1/2 ≤ H < 1: in this case, the time series is persistent and reaffirms the trend, that is, if the series is up (or down) of its long‐term mean in the previous period, it will most likely continue up (or down) in the following period. If H = 1, the series is deterministic, the noise color is black and appears in long‐term cyclic processes. Examples of this are the water level of rivers, the number of sun spots, etc.

DESCRIPTIVE STATISTICS

La Venta, Oaxaca, located in the Isthmus of Tehuantepec, Mexico, is characterized by its high wind speeds, with records of 32.88 m/s for the hourly maximum speed and 9.53 m/s for the average speed, providing a high value for the standard deviation of 5.63 m/s. The coefficient of variation shows a proportion of 59.16% between measures of central tendency and dispersion, see Table .

Descriptive statistics of the hourly wind speed time series

Statistical measure of data	Value
Number of data, N [‐]	54 610
Maximum velocity, v_max [m/s]	32.88
Minimum velocity, v_min [m/s]	0
Mean velocity, v_mean [m/s]	9.53
Standard deviation, v_sd [m/s]	5.63
Coefficient of variation, CV[%]	59.16

The purpose of statistic measurements is to show the outstanding characteristics of the studied data. It is also suitable to make a graphic representation of it to get a qualitative impression of the information. Figure shows the wind speed histogram used in this study; the bins width was defined according to Otto et al. In this figure, it can be observed the bimodal behavior of wind speed distribution, which is a typical characteristic from the area of the Isthmus of Tehuantepec, and that it has been widely studied.

Histogram of wind speed at La Venta, Oaxaca (75 months)

Analysis of the time series components

It was necessary to choose a decomposition model that is capable of identifying the main characteristics of the time series to know the components of the wind speed time series. Two models can be used for this purpose: additive model or multiplicative model. The additive model is appropriate when the magnitude of the seasonal fluctuations of the series does not change as the trend does. On the other hand, multiplicative models are used when the magnitude of seasonal fluctuations in the time series grows and decreases proportionally with the trend.

In the case of the studied wind speed time series, a trend has not been observed. However, it shows fluctuations with different magnitudes. So, a multiplicative decomposition would be useful, according to equation , which allows defining the trend and the seasonality of the dataset.

[Image Omitted. See PDF]

where Y_t is the observed value on one period, t; T_t is the trend; S_t is the seasonality; and E_t are the errors.

An analysis of the linear trend of the dataset allows establishing that all data oscillate around the average, that is, the dataset shows a little or null slope, according to following the equation: V_speed = 4E‐6·t + 9.645 (see Figure ). In the same way, to find the time series seasonality, moving average was applied, whose order was defined as 720 hours (30 days). Figure shows the real‐time series vs the linear model, and the moving average, where the seasonality of the data, typical of meteorological events, can be clearly observed in the last model.

Comparison of real wind speed time series from La Venta, Oaxaca vs linear trend and moving average models

Residuals values were calculated, and were plotted as shown in Figure , to know if there was another pattern that the previous analysis could not detect. In this figure, it can be observed that these values oscillate around zero.

Residual values of the wind speed time series from La Venta, Oaxaca

For this kind of analysis, it is essential to know the behavior of the time series components; however, to complement the analysis, it was necessary to know its internal structure. This was possible through the fractal and multifractal techniques that describe the internal structure of the wind speed time series.

Autocorrelation function and partial autocorrelation function

To know the relationship that could exist between the measurements of the wind speed time series, the autocorrelation function and the partial autocorrelation function were generated to the original time series. A square root transformation was applied to the time series to stabilize the mean. This kind of transformation was selected because of when applying it; no data were lost, or negative numbers remained, as was the case with the application of the base ten logarithms to the time series. Once the mean was stabilized, a difference was made to the data, to establish the relationship with a single past event. Given that measurements data come from a meteorological sample, it was necessary to know if there was a relationship with seasonal situations. Thus, besides the data difference of 1 step backward, a data difference 24 steps backward was carried out.

Figure shows the application to the autocorrelation function to the time series transform; the autocorrelation function values are always between −1 and 1. A positive value indicates a positive linear relationship between the occurred events and their lags. Autocorrelation and partial autocorrelation functions have a maximum value of 1. Similar explication is for the negative part. In this figure, it is possible to observe a negative relationship with an approximate value of 0.5 that the time series has with its event 24 steps backward. The partial autocorrelation function shows the relationship that exists between the data without considering the intermediate events. In Figure , it can be observed the negative relationship between data of the time series, not only with the event 24 steps backward but every 24 steps.

Autocorrelation function of the wind speed time series from La Venta, Oaxaca

Partial autocorrelation function of the wind speed time series from La Venta, Oaxaca

Discordance tests

Discordance tests were used to detect outlier values in normal samples. An outlier value is defined as an observation in a dataset (or subset) that seems inconsistent with the rest of the data. Statistical outlier tests can be classified into five main types:

Statistical deviation or dispersion.
Sum of squares.
Statistical dispersion or total interval.
Mean excess function.
High‐order moment.

The dataset of January 1999 was selected as an example because of its bimodal behavior, where the first mode was presented at moderate wind speeds (between 0 and 9.5 m/s) and the second mode was presented at strong wind speeds (between 9.5 and 25 m/s). Table shows the statistical measurements of the sample to know its behavior before applying the discordance tests.

Descriptive statistics of January 1999

Month	Average	Mode	St. Dv.	Variance	Skewness	Kurtosis
January	12.33	18.64	6.92	47.85	−0.33	−1.32

It is well known that the measures of central tendency are similar in samples which have normal behavior. This condition was not appreciated in the analyzed sample of January. The skewness value near to zero suggests normality condition; however, the Kurtosis value is lower than the ideal normal value of 3. This is ratified qualitatively analyzing the histogram of the January 1999 dataset.

Normality testing: skewness

The data required to perform the test are shown below for the value further from the mean, the critical values can be found in Ref. :

Skewness statistical test: [Image Omitted. See PDF]
Average of the sample: $\bar{x} = 12.33 m/s$ .
The tested value was the furthest from the average, in this case: 24.97 m/s.
Skewness calculation: S_k = −0.3332 (absolute value was considered to the comparison vs the critical values).
Critical value with 99% confidence: 0.22.
Comparison of skewness value vs critical value: 0.3332 > 0.22, then the tested value is an outlier.

The value found as an outlier was eliminated so that the sample was modified. Consequently, the values of the descriptive statistics measures changed. This process was repeated until the test does not detect outliers. Finally, the censored sample, that is, without outliers, was reduced to 482 data for January. So, 262 data of the original sample have been eliminated; of this normal sample, however, the eliminated values are regrouped and again undergo the discordance tests, which can detect more populations that make up the sample. This procedure must be carried out at least with the other four tests.

Statistical measures can be observed in Table , where the primary sample was segmented in two samples: the subset January S1 corresponds the slow wind speeds and the subset January S2 corresponds to the high wind speeds. Thus, as long as both samples were analyzed separately, the datasets of January 1999 showed a normal behavior.

Descriptive statistics of the January's censored sample

Month	Average	St. Dv.	Mode	Variance	Skewness	Kurtosis
January S1	3.9	2.3	4.3	5.4	0.6	−0.3
January S2	17.1	3	18.6	9.3	0	−0.4

Modeling process

The flowchart of the modeling process proposed for the analysis of the fractal time series is presented in Figure . From this flowchart, it can be seen that it is necessary to determine whether the time series to be modeled is monofractal or multifractal, to use the proper modeling technique.

$View Image - Modeling process flowchart of the fractals time series$

Modeling process flowchart of the fractals time series

MODELING OF A TIME SERIES OF MONTHLY WINDOW WIDTH

A self‐affine trace generator was employed to illustrate the way of modeling long‐term time series using the Hurst coefficient. This trace generator requires three main parameters: Hurst coefficient, speeds range, and the number of forward steps.

The technique for modeling time series as a fractional Brownian motion (mBf), is derived from Brownian motion (mB), described by Robert Brown in 1827 and modeled by Norbert Wiener en 1923. The mB as all movements is continuous, but its constant direction changes make it nondifferentiable. Its consistency characteristics are more statistical than geometrical. The model proposed by Wiener is also known as the random midpoint displacement. This model consists in altering the position of an intermediate point within a line, calculating the value of midpoint, and adding a certain random Gaussian value, which can be either positive or negative.

The systematic steps for the calculation are described below:

A Gaussian distribution is defined for [−1,1] from where it is possible to obtain all the needed random numbers.
Make X(0) = 0, and select a random number X(1).
Draw a segment between the points $(0, X (0))$ and $(1, X (1))$ .
Divide the time interval into two intervals with the same length and calculate the value of X(1/2), as the average between $X (0)$ and $X (1)$ adding a random value d₁ rescaled by 1/2.
Discard the generated segment and draw two new segments with the calculated value X(1/2) and the others X(0) and X(1). Each time interval is divided into other two equal ones, and the values $X (1 / 4)$ and $X (3 / 4)$ are used as the measure for the values in the extremes plus two random values d₁₂ andd₂₂ rescaled by $1 / (2 \sqrt{2})$ .
Discard the previous segments and draw the ones obtained with the new and previous values.
Repeat the process n times using the scale $1 / (2 \sqrt{2^{n + 1}})$ as a factor to obtain the random number d_nk for $k = 1, 2, \dots, 2^{n}$ .

The mBf is a particular case of the mB; it is an auto similar process meaning that it is invariant in distribution under an adequate change in time scale and space. To simulate the mBf, it is possible to use a variant of the algorithm for the random midpoint displacement. The initial scale factor, which multiplies the random number used to calculate X(1/2) is $\sqrt{1 - 2^{2 H - 2}}$ , and in the next steps should be multiplied by 1/2^H.

In the case that the time series is multifractal, the model can be generated using the multiplicative cascade technique proposed by Müller and Haberlandt.

The month of August was randomly selected for the analysis of the dataset of La Venta, Oaxaca. There are seven samples corresponding to August each year from 1994 to 2000. All hourly data corresponding to each year were averaged to establish a typical time series of August, that is, all data corresponding to 00:00 hours of each sample were averaged, and the result was the typical value at 00:00 hours of August and so on. Figure shows the data histogram of the typical month of August. The statistical measurements are: mean speed of 8.70 m/s; minimum speed of 4.50 m/s; maximum speed of 12.45 m/s; standard deviation of 1.34 m/s; and coefficient of variation of 15.45%.

Wind speed time series and histogram of the typical month (August)

The used parameters to carry out the simulations in the self‐affine trace generator were are as follows: H = 0.13, which was the obtained value for the month of August from the previous analysis using the power spectrum technique. The simulation range was defined by the difference between the minimum and maximum speed, in this case 7.95 m/s. Finally, the simulation horizon was defined of 744 hours.

The optimal number of simulations was 140. This value was obtained from the following expression: [Image Omitted. See PDF]

where n is the optimal number of simulations, α is a probability, z is a standard normal statistic, k is the maximum permissible absolute deviation over the mean of the frequency distribution, and σ² is the variance of the frequency distribution.

Figure shows a representative time series simulation corresponding to the typical month of August. The statistical measures of the time series presented in this figure are listed in Table . The probability models of the fractal curves are presented in the same figure, where it is observed the similar behavior between the histograms of the fractals series and the typical series of August.

Fractal wind speed time series and histogram of the typical month (August)

Descriptive statistics of real data and generated fractals

Model	v_mean (m/s)	v_sd (m/s)	CV (%)
Real	8.6938	1.3431	15.45
Fractal 1	8.6576	1.4314	16.53
Fractal 2	8.4509	1.5072	17.84
Fractal 3	8.7199	1.3736	15.75
Fractal 4	8.3241	1.1468	13.78
Fractal 5	9.0332	1.4922	16.52
Fractal 6	8.7020	1.5803	18.16
Fractal 7	8.8745	1.2074	13.60
Fractal 8	8.2167	1.5339	18.67
Fractal 9	8.3774	1.4302	17.07
Fractal 10	7.3853	1.1123	15.06
Fractal 11	8.9046	1.2964	14.56
Fractal 12	8.1432	1.5002	18.42

To establish a comparison that can highlight the implementation of the Hurst coefficient in the wind modeling, time series were generated using random numbers with Gaussian behavior, using the characteristics of the real‐time series, that is, the mean speed of 8.69 m/s, and the standard deviation of 1.34 m/s. Simulations were performed using the Minitab software.

Comparing the time series from Figures and , the differences in the time series structures are evident. The fractals time series captured the variable wind behavior, while the Gaussian time series showed a stationary behavior, which makes a different structure from the typical month of August.

Gaussian wind speed time series and histogram

The energy production of a wind turbine varies as the wind. For the isolated systems, this means an additional cost in battery banks to store energy. In the case of wind farms, wind variability directly impacts the energy quality delivered to the electrical grid, due to problems with voltage and frequency stability. Usually, electronic converters must be added to the system to regulate these parameters and so, the cost of investment increases.

Prediction models can be used as a complementary solution to mitigate the undesirable effects of wind variability. In this context, it is vital to improve the time series prediction models by incorporating, for example, the new parameter as is the case of the Hurst coefficient that is the object of this work.

Finally, the typical month chosen for the analysis has a Gaussian behavior. The normal behavior found in the wind data is commonly simulated with a Weibull probability model. In that case, the random numbers required for the self‐affine trace generator describe the behavior of a Weibull probability model.

ANALYSIS OF RESULTS

Daily time series analysis

Before calculating the Hurst coefficient, it is convenient to know if the time series presents multifractal characteristics, so the MDFA test was applied to the dataset. According to the procedure of the MDFA technique aforementioned, when time series are monofractal, the slopes of the functions F_q must be equal, which implies a single Hurst coefficient value. In the case of the wind speed time series analyzed, different slopes appear (see Figure ). Figure shows a graphic representation of the H_q coefficients vs the moments of order qth.

$View Image - Multifractality of the complete wind speed time series$

Multifractality of the complete wind speed time series

$View Image - The q‐order Hurst exponent, Hq, for the multifractal time series$

The q‐order Hurst exponent, Hq, for the multifractal time series

The decreasing q‐order Hurst exponent, H_q, indicates that the segments with small fluctuations have a random walk like structure, whereas segments with large fluctuations have a noise‐like structure.

Since the entire wind speed time series shows multifractal characteristics, discordance tests were applied to identify the characteristics of the populations that form the time series. Tests results can be observed in Table . Sample segmentation guarantees the normal behavior of the data. Once the populations were detected, the MDFA test was again performed on each one of them. This time, the separated samples were monofractals.

Descriptive statistics and Hurst coefficient on the daily scale of the wind time series

	N	Average	Max.	Min.	St. Dv.	CV	H _BC	H _R/S	H _PS	H _DFA
Sample 1	27 198	4.1	8.3	0.0	2.2	53.1%	0.26	0.08	0	0.09
Sample 2	29 648	13.2	18.0	8.3	2.5	19.2%	0.16	0.16	0	0.18
Sample 3	2058	18.8	19.8	18.0	0.5	2.7%	0.10	0.02	0	0.03
Sample 4	623	20.2	20.7	19.8	0.3	1.3%	0.07	‐0.01	0	0.03
Sample 5	608	21.3	22.2	20.7	0.4	1.8%	0.07	0.00	0	0.02
Sample 6	278	22.8	23.6	22.2	0.4	1.7%	0.07	0.08	0	0.03
Sample 7	118	24.5	25.8	23.6	0.6	2.5%	0.07	‐0.01	0	0.04
Sample 8	44	27.3	29.9	26.0	1.0	3.7%	0.06	0.19	0	0.05

With regard to fractal analysis as a function of the coefficient H and fractal dimension, the methods used for the calculation of H although showing different measurements, coincide in the time series being antipersistent with negative correlations among the events and a short‐term memory. This type of series presents a tendency toward chaos also known for being rough and with reversion to the mean. Each event has a tendency to turn into itself (repetition of speeds), this characteristic increases when H is closer to zero.

Figures show a graphic representation of the models in the calculation of the Hurst coefficient, H, for August using the box counting method, rescaled range analysis, power spectrum technique, and FDA, respectively.

Calculation of the Hurst coefficient by the box counting method

Calculation of the Hurst coefficient by the R/S method

Calculation of the Hurst coefficient by the power spectrum method

Calculation of the Hurst coefficient by using the FDA technique for “spring S1” sample

Figure is the result of placing the structure of the August sample in grids with different square sizes and recording the different number of squares and their length in each case. The result of plotting in a graph the length of the squares that make up each grid against the number of squares (boxes) occupied by the series. The resultant points generate a model of linear regression with the slope representing the fractal dimension of the time series.

Figure shows the result of calculating the fractal dimension of the same time series with the R/S analysis technique. To apply this technique, the most viable option was to divide the series in 62 segments of approximately 26 880 pieces of information each and to calculate the statistical R/S for each segment. The mean and standard deviation were calculated for each one, the variance of each piece of information with respect to the mean was determined, and these differences added. R was calculated by subtracting the highest and lowest values of the differences. Finally, the R/S statistic was obtained by dividing R by the standard deviation for each segment. The fractal dimension (D) is the value of the slope resulting from the application of a linear regression model to the double logarithm.

When the data distribution is fitted to a theoretical distribution (in this case, normal distribution), the points are represented by a straight line, which coincides outstandingly with the adjustment model. From Figure , the points describe a different shape (curve), which indicates that the normal distribution of the data has an asymmetry oriented to the right.

Figure is the calculation of the fractal dimension using the power spectrum technique. The fractal dimension is the value of the slope of the linear regression model applied to the logarithmic transformation of the data.

Fractal dimensions were obtained using different techniques from the Hurst coefficients, whose values were close to 2: D_BC = 1.75, D_R/S = 1.93, D_PS = 2.00, and D_FDA = 1.91. This condition ensures that the time series were close to filling up the plane and filling up more than a one‐dimension curve, but less than an area. The analysis of the original, daily time series provides the main structural pattern for the place observed, the antipersistence, in accordance with the techniques used in this study.

Analysis of time series of monthly window width

The statistic values and fractal dimension of the time series of monthly window width; such as mean, standard deviation, coefficient of variation, Hurst coefficient, and fractal dimension were calculated for a normal sample of each month in the time series using the four techniques outlined in the study.

The time series of monthly window width presents an average wind speed ranging from 27.81 m/s in February to 3.22 m/s in June. Dispersion values from 3.74 m/s in December to 0.192 m/s in June. The coefficient of variation varies from 0.013 in June to 0.556 in January.

Regarding fractal analysis, calculations of the Hurst coefficient using the box counting technique are practically constant, from 0.064 to 0.1; using the R/S analysis, the values range from 0 to 0.39; using the Power Spectrum, the range is between 0 and 0.153. Finally, using the DFA technique, the range is between 0.011 and 0.217.

All values indicate antipersistent samples. Negative values observed using the H_R/S technique was a consequence of the little data of the samples, which represents a weakness of this method.

Power spectrum technique generates zero values, in most of its results, assuming a fractal dimension of 2.

Seasonal time series analysis

Results for the statistical and fractal calculations of the seasonal time series showed that the average wind speed was between 3.3 m/s in the summer and 27.75 m/s in winter. The standard deviation fluctuated between 0.33 and 4.7 m/s in the spring and winter, respectively. There is a minimum variation of 1.5% in the winter, and a maximum variation of 55.2% in the autumn.

Results of the Hurst coefficients show antipersistent time series. This behavior was similar to the complete time series and time series of monthly window width. Figure shows the Hurst coefficient calculation using the FDA technique for “Spring S1” sample.

Annual time series analysis

The annual time series has the same behavior being observed in the fractal analysis methods. The similar behavior obtained at different scales suggests an invariance to the scaling of the time series.

Scaling invariance degree

A typical phenomenon of the fractal sets can be found in some time series. This phenomenon, called auto‐affinity, is manifested in the time series if they are represented, in the first instance, in time intervals with decreasing duration, and it is observed that their appearance was similar. The study of the variables and the interactions of a dynamic system through time focuses on finding patterns, structures, critical points of stability or instability, as well as the initial conditions sensibility to change, to achieve a certain degree of control. A common property of natural dynamic systems is the Scaling Invariance, which means that the central moments of first, second, third, and fourth order remain asymptotically constant in space and time. Generally, every real complex system exhibits scale invariance, that is, its behavior does not change by the rescaling of the variables that govern its dynamics.

It is important to point out that fractals that exist in nature tend to be irregular and self‐similar only in a statistical sense. For example, if a sufficiently large set of objects of the same class is analyzed and a part of any of them is amplified; it is possible that it will not be identical to the original, but it will surely be similar to one of the other members of the set.

CONCLUSIONS

It can be concluded from this study that the structure of the time series at La Venta, Oaxaca, Mexico, is antipersistent which indicates a negative correlation between each event (wind speed) that took place. This indicates a relevant characteristic of the time series structure, which must be taken into account when choosing the technique that will be used to generate a wind speed forecasting model for this site.

In this study, the bimodality observed on the histogram of the site was analyzed, concluding that the samples that have this characteristic are multifractal. On the other hand, the samples that present a normal distribution are monofractals.

Regarding the techniques used to calculate the Hurst coefficients, the R/S and FDA techniques obtained similar H coefficient values in more occasions than the BC and PS techniques. The latter with most values close to zero. However, all values in the range of antipersistence. When the samples have few data, the R/S technique showed negative values, which is a limitation of this technique.

Finally, it was shown that it is convenient to model the wind time series using the Hurst coefficient, which captures the variability in the time series structure and makes it possible to have additional information of the time series, to make decisions about management and techniques applied to the data, with the purpose of decrease the uncertainties.

ACKNOWLEDGMENTS

The authors thank the Comisión Federal de Electricidad (CFE), the Universidad Michoacana de San Nicolás de Hidalgo, and the UNAM‐DGAPA‐PAPIIT Program IA107416 for their support and special considerations in carrying out this study.

Word count: 8057

Show less

© 2019. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

This paper presents a study of a wind speed time series from La Venta, Oaxaca, Mexico. The time series consists of anemometric measurements taken by the Federal Electricity Commission of Mexico throughout a little over 6 years. The study was conducted to calculate the Hurst correlation coefficient using: box counting, rescaled range, power spectrum, detrended fluctuation analysis, and multifractal detrended fluctuation analysis techniques. The main objective of this research is to know the correlation among wind speed data to obtain a better description of real conditions of the time series, which is not always available, and to define the structure of its behavior. In this way, more suitable wind speed prediction models can be achieved. Results obtained from techniques above were used to generate fractals time series for a typical month, using the Hurst coefficient and a self‐affine trace generator, which produces fractals time series whose probability distribution is always normal. These time series were compared against time series generated by using random numbers with Gaussian behavior and the characteristics of a typical month. Fractals time series highlight in the qualitative part regarding the modeling of wind speed variability and the descriptive statistics (average, standard deviation, and coefficient of variation), which is similar to the real series. Discordance tests were applied to the datasets to detect deviated values, and so ensure the normal behavior of the samples. These tests showed the existence of different populations with normal behavior in the samples that had bimodal characteristics. By separating the samples, it was possible to apply the self‐affine trace generator to each population found, to generate the fractal time series. An additional objective was to find the level of change in the structure of the original series concerning its statistical and fractal characteristics at different window widths of the time series (daily, monthly, seasonal, and annual) to identify either a specific tendency or dynamic behavior. The results showed a wind speed time series with a negative correlation (antipersistent), a high degree of scale invariance (homothetic), and a fractal dimension very close to 2, thus indicating that the time series is more irregular than a random process.

Details

Title

Wind speed variability study based on the Hurst coefficient and fractal dimensional analysis

Author

Cadenas, Erasmo¹; Rafael Campos‐Amezcua²

; Rivera, Wilfrido³

; Marco Antonio Espinosa‐Medina¹; Alma Rosa Méndez‐Gordillo¹; Rangel, Eduardo¹; Tena, Jorge¹

¹ Facultad de Ingeniería Mecánica, Universidad Michoacana de San Nicolás de Hidalgo, Morelia, Mich, Mexico
² Tecnológico Nacional de México/Instituto Tecnológico de Pachuca, Pachuca, Hgo, Mexico
³ Instituto de Energías Renovables de la Universidad Nacional Autónoma de México (UNAM), Temixco, Mor, Mexico

Pages

361-378

Section

RESEARCH ARTICLES

Publication year

2019

Publication date

Apr 2019

Publisher

John Wiley & Sons, Inc.

e-ISSN

20500505

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1002/ese3.277

ProQuest document ID

2328379710

Wind speed variability study based on the Hurst coefficient and fractal dimensional analysis

Jump to:

Full text

INTRODUCTION

TECHNIQUES USED IN THE ANALYSIS

Box counting method

R/S statistic

Power spectrum

Detrended fluctuation analysis

Multifractal detrended fluctuation analysis

WIND SPEED DATABASE PROCESSING

Random walk

Fractal dimension (D)

Fractal dimension, D, and Hurst coefficient, H

DESCRIPTIVE STATISTICS

Analysis of the time series components

Autocorrelation function and partial autocorrelation function

Discordance tests

Normality testing: skewness

Modeling process

MODELING OF A TIME SERIES OF MONTHLY WINDOW WIDTH

ANALYSIS OF RESULTS

Daily time series analysis

Analysis of time series of monthly window width

Seasonal time series analysis

Annual time series analysis

Scaling invariance degree

CONCLUSIONS

ACKNOWLEDGMENTS

Abstract

Details

Suggested sources