Concentration data are vital source of information in the events of atmospheric releases. They are required to accurately regulate emergency response, exposure assessment, fusion, and dispersion modeling programs. In forward modeling, sampled concentrations from dispersion experiments are utilized as benchmark to evaluate the simple/complex dispersion models in terms of their prediction accuracy for plume width, peak concentrations, and so forth. While in inverse modeling, concentrations are utilized as an important information to retrieve the tracer sources by assimilating them with the dispersion models. In addition, their statistics are equally required in forward and inverse plume modeling for improving the dispersion models and source estimation process, respectively.
During a tracer release in atmosphere, dispersion process undergoes complex interactions and mixing in the atmosphere which are hardly understood. The atmospheric and meteorological processes vary continuously both in space and time. Thus, a true state (or reality) of concentrations can never be observed completely. The models describe an imperfect representation of the reality due to simplifications and limited understanding of the processes. The observations (mainly, sampled concentrations) are made only at the receptors (with fixed location) which reflect a realization of actual dispersion event at a particular location and time. Repeating the observation process, if two observations are taken sequentially corresponding to an event, it is highly probable that they will be observed differently. This raises a major question as “how to determine if two realizations taken at a receptor during a dispersion event are representing same true state of concentrations.” When several realization of measurements are taken by a monitoring network in an event, they are expected to be associated with random variations due to atmospheric turbulence, meteorological variation, and source term uncertainties. These variations in the measurements are, often, minimized during the averaging process or termed as “model errors” when data features are not followed or depicted by the models.
An accurate determination of model error statistics is difficult and not yet fully understood. A practical limitation is scarcity of the measurements which restrict exact determination of such statistics. Often, data are either sampled as an average or limited for a short duration. As a consequence, hypothetical assumptions (for instance, Gaussian distribution) are used for data statistics. Traditional statistics (e.g., mean, standard deviation, and coefficient of variation) or assumptions summarize data properties to a limited extent but do not exactly describe variations in the data. These statistics are influenced by high magnitude of concentrations. The frequent variations in concentration or occurrence of higher concentrations are commonly observed in unstable atmospheric conditions, meandering, or low wind conditions. When meteorological conditions are highly variable during a release, the sampled concentration data also contain large variations, and in such a case, these statistics are misleading in representing the data features. Given the sampled data on each receptor, these statistics provide measures and uncertainty individually for each receptor but do not really summarize an overall data quality or uncertainty from a modeling point of view. For instance, how to determine if modeling uncertainties will be higher or lower for a particular concentration data set. Thus, these statistics still lack a unique measure which can describe overall data characteristics. In addition, with traditional statistics, it is difficult to analyze issues of measurement variability, their reproducibility, and modeling quality. For example, (i) are two different sets of measurements, taken sequentially corresponding to a dispersion scenarios, significantly different from each other or representing same reality of the event? and (ii) are measurements reproducible under a given set of experimental conditions? The modeling quality refers if data are representable by the dispersion models or not, in other words, if a dispersion model will be able to follow or represent the data features or not.
In general, variation in the measurements occurs mainly due to atmospheric turbulence or meteorological variations. However, an exact cause of variation in data is difficult to determine or quantify. This becomes more challenging in accidental scenarios (like gas leakage or industrial hazards) where a priori information of release is not available. A priori information includes nature of source, number of source terms, source origin, released mass, source‐receptor distance, and plume directions. In such a situation, commonly used statistics are dubious. For instance, a low intensity source term close to the receptor will produce same concentrations as an intense source located far away from the receptors. Concentrations may correspond to single or multiple simultaneous releases or varying wind conditions and lack steadiness due to the atmospheric or source term uncertainties. Nevertheless, impact of uncertainty varies from one receptor to another in a monitoring network.
The objective of this study is to develop a statistical framework to analyze variations in the sampled concentration data without utilizing any a priori information of the releases. In general, concentration data sampled over a monitoring network of receptors can be regarded as vectors which have both magnitude and direction. Analyzing directional part of the data is comparatively efficient than magnitude in providing important information regarding variations in data. The clear advantages are (i) directions are independent of concentration magnitudes; thus, higher concentration magnitude do not impact the statistics, (ii) directions can clearly represent the distribution of data and steadiness in the data, and (iii) by analyzing directions and comparing it to the meteorological variations, one can analyze the impact of varying atmospheric/meteorological conditions on the data. Indeed, one observation of concentration data must be thought as a vector in , composed of a set of measurements performed by a network of m receptors at a given time. Since, in real dispersion scenarios, the number of receptors is large (commonly, m>20), the dimensionality of data is high. To analyze high‐dimensional data, a simple strategy is normalizing the vectors, to have unit norm, and thereby put them on the surface of unit hypersphere Mardia (). Such normalized data are characterized into different distributions based solely on their directional components relative to a prespecified origin Jetter et al. (). Spherical statistics (Fisher et al., , pp. 29–33 and 67–96) provide a framework to analyze the directional distribution of data in space based on their orientation, mean direction, variations among the vectors, and so forth. (Hills, ; Sra, ). For modeling high‐dimensional data, Watson, Bingham, and Fisher‐Bingham distributions provide distribution with an increasing number of parameters and, thereby, commensurately increase modeling power (Mardia & Jupp, , pp. 159–206). Indeed, Watson distribution are suitable to model the axially symmetric data (Watson, , ). Data that exhibit several axis of symmetry can be suitably modeled with Bingham distribution that generalizes Watson distribution, though at the expense of dramatically more difficult parameter estimation (Kent et al., ; Sra et al., ).
Our work here utilizes preliminary work and theoretical renormalization framework by Issartel et al. (). The theory suggests to transform measurements into the directions on a unit hypersphere in a matrix weighted norm framework. The proposed weighted norm maximizes angular distance between the measurements on the hypersphere . The theory leads to a postulate that if several realizations of measurements are taken in a dispersion event which corresponds to same true state of the concentrations, the projected measurement directions on the hypersphere will be rotationally symmetric around their principal direction. Hereby, we utilize this postulate to analyze the distribution of concentration data measured in a real scenario. The observed orientation of measurement directions around the principal direction can be modeled by using Watson distribution. A major advantage with Watson distribution is a clustering parameter which can measure tightness of the directional data distributed around principal axis. With this clustering parameter, a statistic can be built on the quality of measurement in view of modeling and inherent variation in measurements due to atmospheric perturbations.
In this study, we utilize real measurements taken from a field dispersion experiment called Fusion Field Trials 2007 (FFT07), conducted at Dugway Proving Ground, Utah, in September 2007 (Storwald, ). The FFT07 data set has been widely utilized for the dispersion model evaluation (Kumar et al., ; Pandey & Sharan, ; Singh & Sharan, ) and inverse modeling algorithms (Albo et al., ; Annunzio et al., ; Singh & Rani, , ). Accordingly, the objectives are (i) to analyze variations in the measurement vectors in the proposed directional framework, (ii) to estimate clustering parameter (κ) in each trial of FFT experiment, and (iii) to model the distribution of directional data on the hypersphere using Watson distribution.
The concentrations measurements are taken from a field dispersion experiment, called Fusion Field Trials, conducted in September 2007 at Dugway Proving Ground, Utah (Storwald, ). The dispersion experiment involves several continuous releases emitting a nonreactive tracer, Propylene (C3H6), from single as well as multiple (up to four) point sources in a flat terrain. The concentrations were measured by fast‐response digital Photoionization Detectors at the frequency response of 50 Hz with a sensitivity of about 0.025 parts per million (ppm) by volume of propylene. A total of 100 concentration samplers was arranged in a rectangular staggered grid of area 475 m × 450 m at 50 m apart and 2 m above the ground (Figure ). The point sources were located, at the Southeast end of the sampling grid, consecutively at an approximate distance of 70 m relative to one another. The releases were located approximately within 30‐ to 50‐m euclidean distance from the last line of the receptors (91–100, see Figure ). The height of release was 2 m above the ground. The release locations vary in each experimental trial; however, for representation, release locations (S1–S4) are exhibited for four release trials in Figure ().
A total of 32 trials, pertaining to continuous/steady release conditions (i.e., traveling time of tracer across the sampling grid is lesser than the release duration), is considered in this study which varies from each other in terms of number of releases, number of concentration measurements, wind conditions, and atmospheric stability. Mainly, these trials can be categorized according to the number of releases involved. The trials corresponds to one point release (Trials 7, 13, 14, 15, 16, 22, 30, 45, 46, 54, and 64), two point releases (Trials 2, 12, 17, 18, 19, 24, 27, 40, and 62), three point releases (Trials 21, 23, 26, 28, 41, 47, 49, and 61), and four point releases (Trials 41, 43, 50, and 55). In most trials, tracer was released continuously at constant rate for a duration of 10 min; however, in some trials (13, 14, 17, and 23), the release duration was approximately 7 min.
The wind was blowing from Southeast to Northwest direction (except Trial 16 where wind was blowing from Northwest to southeast) at the experimental site, and thus, the sampling grid was rotated 25° (toward West) from true North to take the prevailing advantage of the wind flow (Figure ). The occurrence of wind speed, in most of the trials, was observed within a range of 2–5 ms−1. The wind speed was sufficient to transport the concentration plume across the sampling grid in a lesser time than the release duration. Thus, a duration of 4 min is considered sufficient to establish a steady state of the concentrations across the sampling grid. Accordingly, concentration sampled during first 4 min of the release are ignored in the study, and those sampled in later 6 min of the release are considered useful. In data set, fast concentration measurements were collected at the rate of 50 Hz which constructs ≈18,000 measurements in 6 min on each sampler. The fast measurements are expected to be correlated in time. To avoid correlation between the measurements, an arithmetic time averaging over every 2 s (or 100 measurements) is performed on the measured concentrations. In this way, ≈180 observations of concentration measurements are obtained for the statistical analysis from each receptor in most trials. Note that, in Trials 13, 14, 17, and 23, the number of observations remain less (<90) due to limited release duration.
The concentration measurements were reported in units of kilograms per cubic meter and ppm. These are converted to grams per cubic meter for their application in this study. We wish to mention that only nonzero concentration measurements are utilized here for the statistical analysis. The meteorological measurements (wind components and temperature) are taken from a 32‐m ultrasonic tower with five levels (2, 4, 8, 16, and 32) located at the grid center.
The sampled concentration data from FFT07 trials are nondirectional (i.e., in the form of magnitudes). For a given time, data collected over a field of monitoring network is considered as a vector in the euclidean space. The proposed statistical analysis requires a transformation of these concentration data into directional data. Accordingly, as a first step, concentration data are transformed into directions or points on the hypersphere. Then, orientation of the directions is analyzed in view of their distribution, clustering, and mean orientation. Finally, the observed data distribution is modeled by Watson distribution using its principal axis and clustering parameter. Note that bold symbol denotes vector/matrix, scalar/constants are mentioned in italics, and “T” denotes Transpose.
The concentration data are transformed into unitary vectors or directions on the hypersphere through a process of normalization. A standard method is to normalize concentration data vectors by their euclidean norm. However, in this study, a weighted euclidean norm is preferred for transforming concentration data into the directions. The weighted norm represents strength of the signal perceived by the network and is defined by means of a Gram matrix introduced by Issartel () in order to discriminate between various possible locations of an unknown release. The weighted norm maximizes angular distance between the directions on the hypersphere. The Gram matrix is chosen such that the unit vectors are distributed on the hypersphere as widely as possible. The computation of Gram matrix utilizes relationship between source and receptors. Since we assume here that source terms are unknown, the relationship is described by means of a sensitivity matrix which describes sensitivity of the emissions with respect to the measurements into the model space with the help of adjoint dispersion models (Hourdin & Talagrand, ; Issartel & Baverel, ; Pudykiewicz, ).
Suppose, is the series of concentration measurement vectors where , m denotes number of samplers and measurement series contain p measurement vectors. The sensitivity matrix is composed of sensitivity vectors, that is, in which , i=1,2,…,N, where N denotes total number of cells in the discretized space. The column vectors of the sensitivity matrix, , are derived as solution from an adjoint of the atmospheric dispersion model in N‐dimensional model space with respect to ith measurement (Pudykiewicz, ). For the details regarding computation of the sensitivity matrix, readers are referred to Issartel et al. (), Pudykiewicz (), Sharan et al. (), Singh and Rani (), and so forth. Note that computation of sensitivity elements suffers from a singularity artifact at the sampling locations due to strong concentration gradients around the cells containing measurements. To deal with this, Issartel et al. () proposed a weight matrix which removes such numerical artifacts and regularizes sensitivity vectors. The weight matrix is purely diagonal in which elements wii of W are obtained through an optimality criterion such that , where awi=ai/wii is normalized sensitivity vector and is Gram matrix (Issartel et al., ) used to rescale the data.
Now in view of matrix Hw, the weighted euclidean norm of concentration data vector is defined as . The unitary vector νi corresponding to the measurement vector μi is obtained as . These unitary vectors can be seen as directions or points on the surface of positive octant of hypersphere . Using Hw in the transformation of data to unitary vectors maximizes the angular distance between the unitary vectors (or directions) when projected on a hypersphere . Further, these p directions can be transformed into antipodally symmetric axes (or axial data) on the hypersphere by including an additional set of p observations with a negative sign as . Indeed +νi and −νi are equivalent at representing the same measurement vector.
Ideally, the axially symmetric vectors (or axes ±ν1, ±ν2, …, ±νp on ) are expected to be distributed (or clustered) rotationally symmetric around a major axis ±ϑ called the principal axis. Note that the directions (or directional data) consist of only the positive vectors ( ) from the axial data.
The orientation matrix describes several orthogonal axes along which the directional data are distributed. This is defined as (Mardia & Jupp, , p. 165) [Image Omitted. See PDF] where and . The eigenanalysis of orientation matrix provides an interpretation to the shape of the data distribution on the hypersphere.
Let are eigenvalues of T. When all eigenvalues are approximately equal, that is, λ1≃λ2≃…≃λm, the axial data will be uniformly distributed over the hypersphere. When λ1 is large and secondary eigenvalues are small and approximately equal, the axial data will be majorly concentrated, rotationally symmetric around the first eigenaxis (i.e., eigenvector). The first eigenaxis is also called principal axis and refers to the eigenvector corresponding to the largest eigenvalue. The corresponding eigenvectors designate the direction (principal direction) associated with each of the principal strains. In general, the principal directions for the stress and the strain tensors do not coincide. In this case, a bipolar distribution function may fit the distribution of axial data, and a measure of clustering around the principal axis can be determined (Watson, ). In modeling of parametric distribution, the principal axis is also referred as mean axis. In cases, when several eigenvalues are significantly large, the directional data will be distributed around several orthogonal axes, and thus, a girdle distribution may fit the observed axial data on the hypersphere (Watson, ). In this study, our interest is limited to fit the axial data which is rotationally symmetric around the principal axis, and thus, a bipolar Watson () distribution is modeled to fit the data using mean axis (i.e., principal axis) and clustering parameter.
Dimroth‐Scheidegger‐Watson distribution (Watson, ) is used for modeling the directional observations (i.e., axial data) which are antipodally symmetric (i.e., ν and −ν are indistinguishable, so it is ±ν that is observed). The density function is defined as (Mardia & Jupp, , p. 181) [Image Omitted. See PDF] where ϑ denotes mean axis, κ is clustering parameter, and denotes Kummer function (Abramowitz & Stegun, , p. 504). Note that Kummer function is known also as the Confluent Hypergeometric function of the first kind. The distribution is rotationally symmetric about ϑ. For κ>0, the density function will have maximum at ±ϑ, and so the distribution is bipolar. As κ increases, distribution becomes more concentrated about ±ϑ. Thus, the parameter κ measures clustering around ±ϑ. For κ<0, distribution is concentrated around the great circle orthogonal to ±ϑ and called a symmetric girdle distribution.
The log‐likelihood function is (Sra & Karp, ) [Image Omitted. See PDF]
Let ξ1,ξ2,…,ξm (such that ) be the unit eigenvectors corresponding to the eigenvalues λ1 ≥ λ2 ≥ … ≥ λm of T. Since ϑTϑ=1, it follows from equation (Sra & Karp, ) that [Image Omitted. See PDF]
Sra and Karp () obtain an estimate of by solving the general equation [Image Omitted. See PDF] such that λ=λ1 for the bipolar distribution and λ=λm for the girdle distribution. In equation , M′ denotes the derivative of M with respect to . Sra and Karp () proposed following three bounds: [Image Omitted. See PDF] [Image Omitted. See PDF] [Image Omitted. See PDF]
Assuming that be the solution to equation , Sra and Karp () have shown following identities as
- for 1/m<λ<1 (positive solution, bipolar case) [Image Omitted. See PDF]
- for 0<λ<1/m (negative solution, girdle case) [Image Omitted. See PDF]
- and for λ=1/m, κ(λ)=L(λ)=B(λ)=U(λ)=0.
All three bounds (L,B,U) are also asymptotically precise at λ=0 and λ=1.
Schwartzman et al. () show that the quantity s=1−λ also measures clustering around the mean axis. From the definition of λ, one have [Image Omitted. See PDF] where θi is the angle between axis νi and the mean axis ϑ and so [Image Omitted. See PDF] is the average sine‐squared of the angles with respect to the mean direction. If the data θ1, …, θp are tightly clustered, then λ will be almost 1, giving a value of s close to zero. On the other hand, if θ1, …, θp are widely dispersed, then λ will be small and s close to one. Sra and Karp () also show that s is the maximum likelihood estimate of when κ→+∞.
The statistical analysis of directional data is divided into following steps: (i) transformation of concentration data vectors into axial data, (ii) computation of principal axis and analysis of distribution of axial data around principal axis, (iii) estimation of clustering parameter (κ) for Watson distribution, and (iv) modeling of Watson distribution with respect to axial data.
The concentration data are sampled as vectors in m‐dimensional euclidean space. A primary step involves transformation of vectorial data into directional data on the hypersphere . This is done by normalizing a weighted product of data vectors ( ) with their weighted norm ( ) in view of a weighting matrix (refer section ). The computation of requires adjoint sensitivity vectors describing source‐receptor relationship in the model space. We remind here that true source information were assumed to be unknown in this study. To generate sensitivity elements, an adjoint dispersion model is required. For this purpose, an analytical forward dispersion model (Sharan, Yadav, Singh, Agarwal, et al. ) is utilized in the adjoint mode to generate the sensitivity elements (Sharan et al., ). The dispersion model contains linear operators and utilizes an averaged wind speed and direction. Thus, forward model can be established in the adjoint mode with two minor changes: (i) reversing the wind direction by 180° and (ii) replacing the source location by receptor locations assuming release rate unity. The computation of sensitivity vectors is performed on a domain of size 1,200 m × 1,200 m, discretized into 399 × 399 cells. The sensitivity matrix is derived as solutions from the adjoint dispersion model with respect to each measurement (Pudykiewicz, ). Once sensitivity matrix A is computed, the weight matrix W and Gram matrix can be computed using an algorithm described in Issartel et al. (). The matrix is utilized for the transformation of data into directions as mentioned in section . The directional data are converted into axial data by duplicating them with a negative sign of the vectors.
In second step, an orientation matrix (section ) is computed in all the trials. The eigenanalysis of orientation matrix leads to eigenvectors and eigenvalues of the axial data. The eigenaxis corresponding to the largest eigenvalue is considered as principal axis. For analyzing the distribution of axial data around the principal axis, a polar projection is applied on the axial data. This is explained in section . In third step, a numerical estimation of κ is performed for all the trials. The bounds L(λ) and B(λ) are estimated for the using equations and , respectively.
In fourth step, Watson distribution is applied to model the axial data using the angular deviation between the axial data and principle axis and estimated κ parameter. In general, when axial data are majorly scattered around the principal axis, a bipolar Watson distribution is applicable. On the contrary, if axial data are concentrated around an equator (eigenaxis corresponding to the smallest eigenvalue) which lies in a plane normal to the principal axis, a Watson girdle distribution is applied. However, in this study, our framework is limited for modeling the bipolar Watson distribution. For axial data, a histogram is constructed based on angular distance of the directions from their principal axis. The probability density of a direction lying in a given angular interval [θ,θ+dθ] is defined as [Image Omitted. See PDF] where is a constant, and θ is an angle between the principal axis ϑ and the direction ν. Based on these, a probability distribution function can be computed for the directional data.
The results and analysis are presented here for eigenanalysis of orientation matrix, distribution, and clustering of axial data around the principal axis on the hypersphere, bounds of , and modeling of Watson distribution in each trial. The discussion is made by categorizing all the trials into one, two, three and four release trials.
Eigenanalysis of the orientation matrix leads to the eigenvalues and eigenaxes of the directional data. The eigenvalues are computed in a decreasing order such that λ1 ≥ λ2 ≥ … ≥ λm, and thus, first eigenvalue λ1 is considered as largest, and eigenaxis corresponding to λ1 is considered as principal axis. For representation, first four eigenvalues are shown in a bar chart for all trials (Figure ). Overall, λ1 is observed quite large (>0.75) in Trials 13, 23, 28, 30, and 49. Also, the ratio is observed as in these trials. On the other hand, λ1 is found comparatively small (<0.3) in Trials 15, 16, 21, 43, 45, 47, 55, and 61. Several significant eigenvalues exist in these trials (for instance, ). Ideally, when there is only one largest eigenvalue, the directions are expected to be scattered around the principal axis. In case, when there exist several significant eigenvalues, the directions are expected to be widely dispersed around several orthogonal axes.
Among one point releases trials, λ1 is observed relatively large (>0.6) in Trials 13, 14, and 30, while it is small (<0.3) in Trials 15, 16, and 45. In remaining trials, . The secondary eigenvalues are also observed significant in all the trials except Trials 13, 14, and 30. In two point releases trials, we do not see a large value of λ1 in any of the trials as 0.37<λ1<0.56, though λ1 is observed small (≈0.4) in Trials 12, 18, 24, and 62 in comparison to other trials. Also, secondary eigenvalues are significant in two release trials. Among three point releases trials, λ1 is observed quite large (>0.75) in Trials 23, 28, and 49. However, λ1 is small (<0.3) in Trials 21, 47, and 61. In remaining trials, . In three release trials, secondary eigenvalues are observed significant in all the trials except those with large λ1. Among four point releases trials, we do not see any large eigenvalue as 0.26<λ1<0.54. The λ1 is small ( 0.26) in Trials 43 and 55. The secondary eigenvalues are observed significant in all the four release trials.
To represent a direction on the hypersphere , its polar coordinates can be considered where represents angular deviation of a direction from m−1 axes and r=1. Note that 0 ≤ ϕp−1 ≤ 2π, while all the others angular coordinates run over the range 0 to π. When the data are high dimensional, such a projection is, of course, impossible. Nevertheless, first two angular coordinates can be computed, and the distribution of directional data can be analyzed on a polar plot. Here our purpose is to analyze the distribution of directional data mainly around the principal axis; thus, taking two angular coordinates (ϕ1 and ϕ2) is sufficient to understand orientation of the directions. When principal axis ϑ is chosen as reference axis, corresponds with ϕ1, and so ϕ2 denotes angular distance of directions from the orthogonal axis.
The stereographic projection is obtained by representing directions as two‐dimensional polar coordinates on a half circle of unit radius as where (see Figure ). On the plot, blue dot denotes mean resultant direction, and red dot (at center) denotes principal axis. The directions are represented as black dots. The radial distance represents angular distance of a direction from the principal axis (i.e., ϕ1=θ), while the angular distance from the left axis represents angular deviation of a direction from the orthogonal axis (i.e., ϕ2). Note that, in the hypersphere, the principal axis represents the pole, while all the orthogonal axes are distributed around equator. The stereographic analysis helps to identify clusters in the directional data around the principal and corresponding orthogonal axis.
In one release trials (Figure ), spread of directional data varies in each trial. The directions are distributed majorly around the principal axis (within 30°) only in Trials 13 and 30. The other trials show several clusters of directions varying 30–90° from principal axis. The directions are widely dispersed in Trials 15, 22, 45, 46, and 54. In Trials 7, 15, and 16, a major part of the directions are distributed around orthogonal axis. The directions are observed to vary around the orthogonal axis in Trials 15, 16, 45, 46, and 54. The mean resultant direction is observed different (on average, deviated by 50°) from the principal axis in most of the trials.
In two release trials (Figure ), directional data are widely dispersed in all the trials. The dispersion among directional data is large in Trials 17, 19, 24, and 62. In these trials, directions also vary around the orthogonal axis. The cluster of directions is apparent in Trials 18 and 40 within 30–60° around the principal axis. Similar to one release trials, mean resultant direction is observed here with an angular deviation of 50° from the principal axis. Among three releases trials (Figure ), directions are majorly clustered within 30° in Trials 23, 28, and 49 and within 30–60° in Trials 41 and 47. In Trials 21, 47, and 61, a cluster of directions is also observed within 60–90°. In Trials 26 and 47, directions show large variations along orthogonal axis. The mean resultant direction is deviated, on average, by 45° from the principal axis. In four release trials (Figure ), cluster of directions are observed within 30–60° around the principal axis in all the trials. While in these trials, another cluster is also observed orthogonal to the principal axis. In Trials 43 and 55, the directions have large variations along the orthogonal axis as well. The average deviation of mean resultant direction from principal axis is noted as 55°.
Overall, in three release trials, the directions are relatively less dispersed in comparison to other release trials. In 50% of the trials, the directions show a large variations around both principal and orthogonal axes. Such trials are expected to have an impact of frequent variations in atmospheric or wind conditions. By analyzing the time series of concentration data, we observed that an occurrence of clusters is also related to the variations in concentration magnitudes which could occur due to varying number of sources or atmospheric conditions.
The distribution of directional data is characterized into probability density function by constructing histograms of latitudes (0<ϕ1<90°) with an angular spacing of 3°. The histogram for directional data observed between 0° and 90° and between 90° and 180° will be same. The clustering in the directional data (in Figures ) is reflected as peaks in the histogram (Figure ). More commonly in trials, two peaks are observed in the histogram, one around the principal axis and other around orthogonal to the principal axis. The peaks around orthogonal to the principal axis represent an accumulation of data distributed around the great circle or orthogonal to the principal axis. By comparing the norms of the measurement vectors, we observed that directional data around orthogonal axis correspond to low norms. A flat peak is indication of relatively large variance in the data.
In one release trials, a major peak around principal axis is observed in most of the trials (except Trial 16). Another peak orthogonal to principal axis is also observed in Trials 7, 15, 16, 45, 46, and 54. Around principal axis, peaks are observed either within 20° (in Trials 7, 13, 14, 15, 30, and 54) or between 40° and 60° (in trials 22, 45, 46 & 64). In two release trials, a peak around principal axis is observed in all the trials. Another peak (smaller) is also observed around orthogonal to principal axis in most of the trials (except Trials 18 and 27). The peak around principal axis is observed either within 20° (in Trials 2, 17, 19, and 24) or between 30° and 50° (in Trials 12, 18, 27, 40, and 62). Similarly in three release trials, peaks are observed around principal axis within 20° (in Trials 23, 28, and 49) or between 30° and 50° (in Trials 21, 26, 41, 47, and 61) in all the trials. In Trials 21, 26, 47, and 61, a large peak lies orthogonal to the principal axis. Similarly in four release trials, peak around principal axis is observed within 30–60° in all the trials except Trial 43. In these trials including Trial 43, peak also lies orthogonal to the principal axis.
The atmospheric conditions have large impact on the plume dispersion process during a release, and they are correlated with inherent variations in the data over space and time. In particular, atmospheric turbulence is purely random and difficult to parameterize or measure perfectly. Although, wind conditions during the release can be measured accurately and analyzed for its relation with variations in the plume concentrations. In FFT experiment, wind data were sampled for 1 hr (which includes the release duration) with a frequency of 10 Hz at 4‐m height of sonic tower located at the center of receptor's grid in each trial. The histogram of directional data and wind rose plots are compared and shown in Figure for representative trials.
Overall, wind speed was varying between 1 and 7 ms−1, and wind direction was mostly from south‐southeast or east‐southeast. The wind direction is observed quite variable in 60% trials (viz., Trials 2, 14, 17, 21, 22, 23, 24, 31, 40, 45, 46, 47, 54, 55, 62, and 64). Among these, wind direction variability is observed very large (90–225°) in Trials 2, 17, 21, 23, 43, and 45. Similarly, wind speed is observed quite varying (2–6 ms−1) in Trials 7, 16, 24, 45, 54, and 55. Wind speed less than 2 ms−1 is considered as low wind conditions, and plume dispersion in low wind conditions is challenging and not yet completely understood. In low wind conditions, plume may have multiple local concentration peaks, meandering phenomenon and complex dispersion behavior (Anfossi et al., ; Sharan, Yadav, Singh, Agarwal, et al., ; Sharan, Yadav, & Singh, ; Yadav & Sharan, ). Low wind conditions are severely observed in Trials 2, 15, 17, 18, 19, 21, 22, 23, and 41.
Among one release trials, wind conditions are found steady only in Trials 13 and 30. While Trials 14, 15, 22, and 46 correspond to relatively low wind conditions, Trials 7, 16, 45, and 54 have large wind velocity fluctuations. Among two release trials, steady wind conditions are observed only in Trials 12, 18, 19, and 27. However, Trials 18 and 19 are associated with low wind conditions. The other trials (2, 17, 40, and 62) are associated with both large wind variability and low wind conditions. In Trial 24, wind velocity fluctuations are large. Among three release trials, steady and moderate wind conditions are observed in most trials except Trials 21 and 23 which correspond to low winds and large wind direction variability. Among four release trials, wind conditions are relatively steady in Trials 31 and 50, while Trials 50 and 55 are cases of large wind direction and wind velocity variability, respectively.
In a comparison of wind condition variations with the histograms of directional data around principal axis, we observed that the directions are majorly gathered around the principal axis (as a cluster) only in trials where wind conditions are steady and variability in wind direction or wind speed is minimum (for instance, Trials 13 and 30). The angular deviations between directions and their principal axis are relatively large in cases of low wind conditions when the wind direction is not variable (for instance, Trials 18 and 41). The directions are aggregated orthogonal to the principal axis when wind direction is highly variable (i.e., wind direction variability is large; for instance, Trials 2, 17, 21, and 43). The directional data are also largely dispersed when there exist a variability in the wind speed (like in Trials 16, 24, 45, and 55). Trial 23 is observed as an exceptional case where wind conditions are severely low and wind variability is relatively large, but the directional data are observed as a cluster around the principal axis.
Exact numerical estimation of κ is difficult and involves numerical errors especially in case of high dimensions (in present study 5<m<80). However, the theoretical bounds (section ) proposed by Sra and Karp () can be estimated for κ in all the trials. With FFT data set, we see that the condition 1/m<λ1 holds always true, and thus, L(λ) and B(λ) will be the lower and upper bounds to the (Figure ). Thus, we assume here L(λ) as an approximate value of κ such that . Ideally, a higher value of indicates a tight clustering of directions around the principal axis, whereas a low value of corresponds to large dispersion in the directions. Overall, in Trials 13, 23, 28, 30, and 49, is estimated quite large (>60), whereas in Trials 2, 7, 12, 16, 17, 45, and 54, is estimated comparatively quite small (<20).
In one release trials, is observed higher ( ) in Trials 13, 30, and 64 and comparatively small ( ) in Trials 7, 14, 15, 16, 45, and 54. In two release trials, is observed mostly in interval [25, 39] except in Trials 2, 12, and 17, where is observed very small (<20). In three releases, is observed relatively large ( ) in Trials 23, 28, and 49. In remaining trials, we observed . Among four release trials, the is observed in [35, 53].
Further, to analyze variations in κ with respect to largest eigenvalue (λ1), number of measurements (m or dimension of hypersphere m−1), and average angular deviation of directional data from the principal axis (θ), a scatter plot comparison is made for all the trials (Figure ). The figure shows that variations in κ is linear with respect to the λ1, m, and θ. Overall, we observed that κ is large when first eigenvalue (λ1>0.7) is very large, number of measurements are large (m>30), and average angular deviation is small (<30°). However, in Trials 23 and 30, as a contradiction, κ is still observed relatively high (≈60), while number of measurements were relatively very small (m<20). In most trials (except 13, 23, 28, 30, and 49), κ is observed within an interval [0.5m,m], while in Trials 13, 23, 28, 30, and 49, κ is observed within an interval [2m,4m].
The theoretical Watson distribution (section ) is applied to model the directional data using two parameters, angular deviation between directions and principal axis, and clustering parameter κ. The statistical goodness‐of‐fit of Watson distribution to the directional data is evaluated by Kolomogorov‐Smirnov (KS) statistical test at 5% level of significance. A null hypothesis is interpreted as “The observed directional data follows theoretical Watson distribution.” The directional data are widely distributed around the principal axis as well as other orthogonal axes in the form of multiple clusters. In this case, it is not feasible to apply Watson distribution to model all clusters at once. When applying single Watson distribution, it achieves an approximate fitting in 40% of the trials (viz., Trials 12, 18, 21, 22, 23, 27, 31, 41, 47, 54, 55, 62, and 64). The statistical KS test shows an acceptance of null hypothesis in the mentioned trials. However, in other trials, Watson distribution could fit only one cluster around the principal axis and, often, observed overly predicted, while other clusters (including those orthogonal to principal axis) are not modeled.
Alternatively, a mixture of Watson distribution can be applied to model each cluster separately by estimating their corresponding principal axis and κ. Here we utilize an Expectation‐Maximization (EM) algorithm (Appendix ), originally proposed by Bijral et al. (), to separate the multiple clusters around principal axis and to estimate the corresponding κ parameter. Prior to applying EM algorithm, we categorized trials into three categories based on distribution of directional data (Figure ): (i) Trials 13, 18, 23, 28, 30, and 41 with one cluster; (ii) Trials 19, 31, 40, 45, 47, 49, 50, and 54 with two clusters; and (iii) remaining trials with four clusters (Figure ). In case of four clusters, the results are exhibited only for Trials 12 and 24 (Figure ).
In trials (13, 18, 23, 28, 30, and 41) with one cluster, a visual comparison between observed and modeled distribution shows that Watson distribution approximate ly represents the observed data characteristics. The theoretical Watson peak is often slightly overpredicted (within factor of 1.5) and right shifted within 10° to the observed data peak. The κ estimated by EM algorithm is found approximately similar to lower bound κ values as mentioned in Figure . As discussed before, κ is estimated relatively small (<37) in Trials 18 and 41 and comparatively large (>60) in rest of the trials. For goodness‐of‐fit, KS test shows an acceptance of null hypothesis in Trials 18, 23, and 41 and rejection in Trials 13, 28, and 30. A major reason for the rejection of null hypothesis in some trials is right shift in the peak of theoretical distribution. Overall, in Trials 18, 23, and 41, KS test supports the hypothesis that directional data follow Watson distribution.
In Trials 19, 31, 40, 45, 47, 49, 50, and 54 (Figure ), directional data are divided into two clusters which are modeled separately with Watson distribution using EM algorithm. The κ parameters are estimated corresponding to both clusters. For one of the clusters, EM algorithm estimates κ approximately similar (within 10% variation) to the lower bound (L(λ)) values, while for other cluster, κ is estimated relatively large in range 70–300. In most trials (except Trials 49 and 50), κ is always <40 for one of the clusters. Especially, in Trials 19, 45, and 54, we see that κ ≤ 22. The theoretical Watson distribution is often overpredicting the observed peaks (within a factor of 3) in most trials except for one of the clusters in Trials 47 and 54. The KS test provides an acceptance of null hypothesis for both the clusters in Trials 31, 47, and 54 and for one of the clusters in Trials 19, 40, 45, and 50. Overall, in most of the trials in this category, we obtain a good fitting of Watson distribution for at least one of the clusters around the principal axis.
The remaining 18 trials (2, 7, 12, 14, 15, 16, 17, 21, 22, 24, 26, 27, 43, 46, 55, 61, 62, and 64) are observed to be associated with large dispersion in directional data. Therefore, EM algorithm is applied in these trials by categorizing them in four clusters and estimating parameter κ for each of these clusters. The κ values vary differently for each cluster from trial to trial. In all the trials, among four clusters, least value of κ is always estimated as given in L(λ). For other three clusters, κ is often observed more than 100. The fitting of theoretical Watson is observed relatively difficult and unsatisfactory in these trials. Among four clusters, at most two clusters are well modeled by the Watson distribution. In most trials, theoretical Watson distribution shows an overprediction of observed peaks (mostly within a factor of 3) and right shift (within 10°) of the observed peaks. The KS test provides an acceptance of null hypothesis for three clusters in Trial 14, for two clusters in Trials 2, 12, 24, 55, and 64, and for one cluster in Trials 7, 27, 43, 61, and 62. Overall, we can say that in 60% trials of this category, at least one of the four clusters around principal axis follows Watson distribution.
We noticed that Watson distribution fits relatively well in trials where the first eigenvalue and κ both are comparatively large (i.e., λ1 ≥ 0.7 and κ ≥ 60). However, occurrence of several significant eigenvalues does not necessarily imply a large value of κ, for instance, in Trials 2, 14, and 40. However, in Trials 18 and 41, we still see a good Watson fit in spite of significant secondary eigenvalues ( ) and κ<40. This could be due to the existence of one large cluster of directions dispersed within a small angular interval only around the principal axis.
The directional analysis of FFT data highlights interesting features which vary differently among trials. A correspondence between meteorological variations and distribution of directional data can be easily established in most of the trials. The measurements are observed to be severely affected by the variations both in wind direction and wind speed. A large clustering in the directional data is observed only with steady wind conditions and moderate wind speed. The directions are widely distributed when wind conditions are highly variable or in case of low wind speed. In fact, in low wind conditions, variation in measurements is difficult to determine or predict (for instance, in Trials 21 and 23). We observed that the impact of atmospheric conditions are higher than varying number of source terms. It is difficult to determine the number of source terms involved in the data by analyzing only clustering of its directions.
The study shows that the Watson parameter κ can indicate and summarize variations in the measurements in each trial. The standard error in Watson distribution is proportional to (Mardia & Dryden, ). Thus, standard error are small when κ will be large. In other words, we can interpret that the standard errors are large when directions are widely distributed. In trials associated with higher values of κ, directions are tightly clustered and do not exhibit a large spread; in such a case, measurements will be steady. When direction are tightly clustered, measurements can be expected to be reproducible, and such data are expected to be better modeled by the dispersion models. Accordingly, the model errors would be small for such data, and dispersion models will have better predictability in forward modeling. Similarly, for inverse modeling or source term estimation with such data, there will be more confidence in retrieved source estimates, and a posteriori uncertainty will be small. On the contrary, small values of κ highlight large variation in the data. Therefore, model errors can be expected large with such data, and dispersion models can hardly represent the data characteristics correctly. Similarly, in inverse modeling, source estimates will have large uncertainties in their estimation. Watson distribution is able to model the directional data derived from concentration measurements especially in steady concentration state and steady wind conditions. A mixture of Watson distribution can be applied to model the directional data with multiple clusters. Moreover, with the proposed framework of analysis, it is feasible to characterize which part of data is clustered or widely distributed (have large variations), and thus, such analysis have potential to further explain the modeling predictability and uncertainties toward data.
The exact fitting of theoretical Watson distribution to the observed directional data in FFT experiment is not obvious. However, FFT data are an interesting and challenging data sets from a statistical point of view since it has an impact of both turbulence and simultaneous releases which is difficult to separate if observed several clusters of data. There are issues with respect to both real data and Watson modeling. The real concentration data are associated with unknown random structure and intensity of turbulence, meteorological uncertainties, and multiple simultaneous releases which poses complex conditions to model exactly a theoretical parametric law. Occurence of multiple direction clusters in multiple release trials can be considered reasonable as compared to single release trials. The variations in meteorological conditions might responsible for poor Watson fitting in one release trials, and perhaps, the concentrations are not purely steady in trials with very low κ. Multiple clusters of directions may appear in multiple release trials due to variations in the source term or due to mixed effect of meteorological variation and atmospheric turbulence. However, these impacts are difficult to determine and to distinguish from each other. Another big limitation was size of the directional data. In FFT, data were collected only for a small time interval (<10 min); thus, only a limited number of uncorrelated directional data were available with which it is difficult to model exactly a parametric distribution. From a modeling point of view, an exact number of clusters are difficult to determine in a data set with high dimensions. Also, κ is approximated from its bounds and EM algorithm since its numerical estimation is computationally expensive.
The study highlights importance of analyzing the directional part of data in providing an insight to the concentration measurement variations in air pollution modeling. The present study illustrates a detailed analysis of concentration data collected from 32 continuous release trials (involving multiple sources) conducted in FFT experiments. The analysis of data is performed through the eigenanalysis, stereographic projection, and histograms using the angular deviation of directions from the principal axis. The study provides an insight into directional data distribution with respect to the variation in meteorology. It shows that, in ideal steady conditions, the directional data are able to follow Watson distribution. The clustering parameter κ is shown to be efficient in characterizing the directional data variations.
Eigenanalysis demonstrates that steadiness of concentration data is related to the magnitude of largest eigenvalue. The significant existence of secondary eigenvalues indicates variation in concentration measurements. With FFT data, first eigenvalue is observed quite large only in 30% trials. In most trials, directional data are distributed along several orthogonal axes. With polar stereographic projection plots, distribution and clustering of the directional data are analyzed around the principal axis and its orthogonal axis. In most trials, one cluster of directions exists within 30° around the principal axis, but often, the trials are associated with multiple clusters (ranging from two to four) between 30° and 60° around the principal axis. Also, high frequency of directions is observed orthogonal to the principal axis in most trials. Note that the directions aggregated orthogonal to the principal axis are those distributed around different orthogonal axis to the principal axis. By analyzing norm of directional data, we observed that directions associated with higher norm are closer to the principal axis than those with smaller norms. With the theoretical bound of κ, we observed that κ is quite large in trials where there is only one cluster of directions in data and first eigenvalue is quite large.
Overall, in 60% trials of FFT, Watson law statistically fits the observed cluster of directions around principal axis. A mixture of Watson distribution is able to model the directional data distributed in the form of multiple clusters. The theoretical Watson distribution shows an overprediction in peaks and right shift to the observed directional distribution in most trials. The fitting is not discouraging since real data are influenced with unknown random structure of turbulence and meteorological uncertainties and, thus, quite difficult to fit exactly a theoretical distribution. The limitations here raise a future scope for examining the directional statistics of concentration data generated in synthetically controlled turbulent situations or in idealized scenarios.
The governing advection‐diffusion equation (accounting advection along mean wind and diffusion in all the three directions) for a nonreacting, nondepositing tracer released from an elevated point source located at (x0,y0,z0) with strength q is written as [Image Omitted. See PDF] where c is the predicted mean concentration, U is the mean wind speed, (Kx,Ky,Kz) is the diffusion tensor, and s=qδ(x−x0)δ(y−y0)δ(z−z0) is the source term. The δ(.) denotes Dirac‐delta function. The x axis is oriented in the mean wind direction. An analytical solution for equation with relevant boundary conditions is developed by Sharan, Yadav, Singh, Agarwal, et al. () as [Image Omitted. See PDF] with and in which z0 is the release height and (σx,σy,σz) are the dispersion parameters along x, y, and z directions, respectively. This model requires the values of wind, dispersion parameters, and atmospheric stability.
In the FFT07 experiment, the turbulent wind velocity fluctuations were measured and given in the data set. Therefore, in this study, the dispersion parameters are derived from the measurements of wind velocity fluctuations in the lateral and vertical directions. Gryning et al. () have proposed a method to compute the dispersion of plume in terms of σy and σz from standard deviations of the corresponding velocity fluctuations in the lateral (σv) and the vertical (σw) directions. This is written as [Image Omitted. See PDF] where t is the travel time and Ty and Tz are the Lagrangian time scales in the lateral and the vertical directions, respectively. The value for Ty is taken as 600 s (Irwin, ), whereas values for Tz (in seconds) are taken from Gryning et al. () as [Image Omitted. See PDF] in which L is Monin‐Obukhov length. In this study, σx is assumed to be same as σy. The inputs to the analytical dispersion model are provided as wind speed, wind direction, wind velocity fluctuations, and atmospheric stability at 4‐m level.
The adjoint model for the forward transport equation can be written as (Pudykiewicz, ) [Image Omitted. See PDF] in which πi=δ(x−xr) where xr denotes position of receptors and ai is the sensitivity of the ith location with respect to the measurement. It can be noticed that the equations and differ mainly by two terms: (i) negative sign of the wind and (ii) source term is replaced by the receptors measuring the concentrations. Due to symmetry in advection and diffusion operators, the solution developed for the forward dispersion model can be utilized for generating the elements of the matrix A with two minor changes: (i) wind direction is inverted by 180°, and (ii) the source location is replaced by the receptors emitting unit amount of tracer per unit time. It is important to mention that sensitivities have peaks on the cells coinciding with the sampler's locations which spread to their neighboring cells. This raises an artificial influence (or numerical artifact) in the source identification (Issartel et al., ). To resolve this, the cell containing the sampler and the neighboring cells are further subdivided into 99×99 cells, and an average value of the sensitivity coefficient is computed for the receptor cell.
Consider a generative model for directional data as a mixture of K watson distributions (Bijral et al., ). Let fj(ν|ϕj) be one Watson component for a class corresponding to the parameters ϕj=(ϑj,κj) and 1 ≤ j ≤ K. The density for a point generated by this model is then given by [Image Omitted. See PDF] where Φ=(α1,…,αK,ϕ1,…,ϕK) and αj are the mixing proportions that sum to one.
E‐step: Compute for all points νi and mixture components 1 ≤ j ≤ K
M‐step: Update αj, ϑj, and κj for all mixture components [Image Omitted. See PDF]
Solve the following nonlinear equation for ϑj [Image Omitted. See PDF]
Compute T using ϑj [Image Omitted. See PDF]
[Image Omitted. See PDF]
The E‐step returns the posterior probabilities p(j|νi,Φ) of all the classes, given the point.
The FFT07 database is available online (
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2019. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
The statistics of concentration data, measured during unknown atmospheric dispersion events, are not fully understood although they are required in modeling, assessment, uncertainty analysis, and information fusion. The concentrations measured over a field of monitoring network is regarded as a vector which contains both magnitude and direction. Traditional statistics (mean, standard deviations, etc.) based on magnitude of concentration data summarize data properties but are limited in characterizing variations in data and their modeling quality. Comparatively, directions are efficient in providing valuable information to address these issues. Here we propose a statistical framework which transforms concentration measurements into directions projected on a hypersphere and analyzes their orientation and distribution. The directional data measured in identical conditions are expected to be rotationally symmetric around its principal axis and follow Watson distribution. The clustering parameter of Watson distribution measures tightness of directional data and, thus, can measure indirectly variations in observed data. It is shown that the clustering parameter is able to summarize an overall variation in data and modeling quality of data in a dispersion trial. The study analyzes real data taken from continuous release experiments, called “Fusion Field Trials,” conducted at Dugway Proving Ground, Utah, United States.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer